EduChat is an open source educational dialog model developed by the ICALK team at East China Normal University. It focuses on educational scenarios, supports conversations in English and Chinese, and aims to provide intelligent conversation tools for students, teachers and researchers. The model is based on open-source frameworks such as LLaMA and Qwen, fine-tuned by a large amount of data in the education domain, and has the ability to handle generic dialogs, psychological counseling, and Socratic pedagogical dialogs.EduChat supports GPU deployment, which makes it suitable for use in educational research or actual teaching. It also provides a data cleaning tool, CleanTool, to help users optimize training data. The project is open-sourced on GitHub and has gained widespread traction, with 748 stars as of 2024.
Function List
- Supports educational dialogues in English and Chinese, suitable for classroom teaching, academic discussions and psychological counseling.
- Multiple model sizes are available, including 1.8B, 7B, 13B and 14B parameter versions.
- Supports Socratic dialog to guide users to think deeply.
- Offers conversations on psychology topics, recommends relevant books or provides emotional support.
- Open source data cleaning tool CleanTool to optimize training datasets.
- Supports GPU-accelerated deployments and is compatible with hardware such as A100/A800.
- Sample code is provided for developers to quickly invoke the model.
Using Help
Installation and Deployment
EduChat is an open source project that needs to be downloaded via GitHub and deployed locally. Here are the detailed installation steps:
- environmental preparation
Ensure that Python 3.8+ and PyTorch are installed on your system. a GPU environment (e.g., NVIDIA A100/A800) is recommended to support FP16 precision operation, which requires approximately 15GB of video memory. Install the necessary library dependencies:pip install torch transformers
- Download model
Accessing GitHub Repositorieshttps://github.com/ECNU-ICALK/EduChat
, clone the project locally:git clone https://github.com/ECNU-ICALK/EduChat.git
Model files should be downloaded from Hugging Face. Recommended
educhat-sft-002-7b
model for single card GPU operation. Download command:huggingface-cli download ecnu-icalk/educhat-sft-002-7b
- Loading Models
Load the model using the provided sample code. The following is an example of a call to theeduchat-sft-002-7b
Python code:from transformers import LlamaForCausalLM, LlamaTokenizer tokenizer = LlamaTokenizer.from_pretrained("ecnu-icalk/educhat-sft-002-7b") model = LlamaForCausalLM.from_pretrained("ecnu-icalk/educhat-sft-002-7b", torch_dtype=torch.float16).half().cuda() model = model.eval()
- Generate a dialog
Configure a system prompt that defines the roles and capabilities of EduChat. For example:system_prompt = "<|system|>你是一个人工智能助手,名字叫EduChat。 - EduChat是一个由华东师范大学开发的对话式语言模型。 EduChat的工具 - Web search: Disable. - Calculators: Disable. EduChat的能力 - Inner Thought: Disable. 对话主题 - General: Enable. - Psychology: Disable. - Socrates: Disable.</s>" query = system_prompt + "<|prompter|>你好</s><|assistant|>" inputs = tokenizer(query, return_tensors="pt", padding=True).to(0) outputs = model.generate(**inputs, do_sample=True, temperature=0.7, top_p=0.8, repetition_penalty=1.02, max_new_tokens=256) response = tokenizer.decode(outputs[0][inputs.input_ids.shape[1]:], skip_special_tokens=True) print(response)
Example output:
你好!我是EduChat,有什么我可以帮助你的吗?
- Featured Function Operation
- Psychology Conversations: When the Psychology topic is enabled, EduChat can recommend psychology books or provide emotional support. For example, enter "Recommend me some psychology-related books" and the model will return:
当然,以下是一些关于心理学的经典书籍: 1.《人性的弱点》(Dale Carnegie):介绍人际关系技巧,帮助建立良好沟通。 2.《心理学与生活》(Richard J. Gerrig):全面介绍心理学基础知识,适合初学者。
Configuration Psychology Dialogue:
system_prompt = "<|system|>你是一个人工智能助手,名字叫EduChat。 - EduChat是一个由华东师范大学开发的对话式语言模型。 EduChat的工具 - Web search: Disable. - Calculators: Disable. EduChat的能力 - Inner Thought: Enable. 对话主题 - General: Disable. - Psychology: Enable. - Socrates: Disable.</s>"
- Socratic Dialogue: Leads the user to deeper thinking by asking questions, suitable for teaching scenarios. Enabling Methods:
system_prompt = "<|system|>你是一个人工智能助手,名字叫EduChat。 - EduChat是一个由华东师范大学开发的对话式语言模型。 EduChat的工具 - Web search: Disable. - Calculators: Disable. EduChat的能力 - Inner Thought: Disable. 对话主题 - General: Disable. - Psychology: Disable. - Socrates: Enable.</s>"
Sample input, "What is fairness?" The model will guide the user's thinking by asking rhetorical questions such as, "What do you think is at the heart of fairness? Is it the same outcome or equal opportunity?"
- Data Cleaning Tools CleanTool: For optimizing training data, with support for de-duplication and low-quality data filtering. Run CleanTool:
python clean_tool.py --input data.json --output cleaned_data.json --gpu True
- Psychology Conversations: When the Psychology topic is enabled, EduChat can recommend psychology books or provide emotional support. For example, enter "Recommend me some psychology-related books" and the model will return:
- Internal Test Application
If you need access to the latest models or data, you can send an email todan_yh@stu.ecnu.edu.cn
The title of the email is "EduChat internal test application + unit", and the email states the purpose.
Precautions for use
- Make sure you have enough GPU memory, 7B models require about 15GB of memory, 13B models require more.
- The model does not support real-time networked searches and requires local configuration of the data.
- Check the GitHub repository regularly for updates to the latest models and documentation.
application scenario
- Classroom Teaching Aids
Teachers use EduChat's Socratic Dialogue feature to guide students into deeper discussions. For example, in a philosophy class, type in "what is truth" and the model will help students analyze the concept by asking questions. - Psychological counseling support
Students or researchers access emotional support or book recommendations through the Psychology Conversations feature, suitable for mental health education or research scenarios. - Education data research
Researchers use CleanTool to clean education domain datasets and improve the quality of model training for academic research. - AI development testing
Developers use EduChat's open source code to quickly build educational dialog systems and test the effectiveness of dialog generation.
QA
- What languages does EduChat support?
EduChat supports both English and Chinese conversations, and the training data contains about 4 million English and Chinese commands and conversations, which is suitable for multilingual education scenarios. - How do I choose the right version of the model?
1.8B and 7B models are suitable for low-computing-power devices, while 13B and 14B models are suitable for high-performance GPUs, which are more effective but resource-intensive. - Do I need to be connected to the Internet to use it?
No. EduChat is a locally deployed model, disabling the web search function and requiring the model and data to be downloaded in advance. - How do I get involved in project development?
You can submit issues or pull requests in the GitHub repository to participate in model optimization or feature development.