DeepResearcher is an open source project developed by the GAIR-NLP team at Shanghai Jiao Tong University. It is an intelligent research tool based on Large Language Models (LLMs), trained end-to-end in a real network environment through Reinforcement Learning (RL). The project aims to help users efficiently complete complex research tasks. It automatically searches for information, verifies data accuracy, and generates detailed results.DeepResearcher supports 7B parametric models and has been open sourced on Hugging Face. The code is available via GitHub and is suitable for researchers, students and technology enthusiasts.


Function List
- Automation Research: When a question is entered, the web is automatically searched and relevant information is organized.
- cross-source authentication: Check data from multiple sources (e.g. Google, Bing) to ensure reliable results.
- Self-reflective adjustments: Self-assessment based on search results and redirection of research to improve accuracy.
- Development of a research program: Automatically generate research steps when dealing with complex problems.
- Keep it honest.: Limitations are stated directly when no clear answer can be found.
- Open Source Modeling Support: 7B parametric models are available for download and customization by the user.
Using Help
Installation and use of DeepResearcher requires a certain level of technical knowledge, but the official documentation provides clear guidelines. Below are detailed steps to help users get started quickly.
Installation process
- Clone Code Repository
 Run the following command in the terminal to download the project locally:
git clone https://github.com/GAIR-NLP/DeepResearcher.git
Go to the project catalog:
cd DeepResearcher
- Creating a Virtual Environment
 Use conda to create a separate Python environment and avoid dependency conflicts:
conda create -n deepresearcher python=3.10
Activate the environment:
conda activate deepresearcher
- Installing core dependencies
 Install PyTorch and other necessary libraries by running the following commands in sequence in the project root directory:
pip3 install torch==2.4.0 --index-url https://download.pytorch.org/whl/cu124
pip3 install flash-attn --no-build-isolation
cd verl
pip3 install -e .
cd ../
pip3 install -r requirements.txt
These steps ensure that the base environment required for the model to run is in place.
- Verify Installation
 Enter the following command to check if PyTorch is installed properly:
python -c "import torch; print(torch.__version__)"
If the version number is displayed (e.g. 2.4.0), the installation was successful.
Configuration and Startup
DeepResearcher uses the Ray framework for training and inference, and also requires configuration of the search service. Here's how to do it.
Starting the Ray Service
- Setting Node Ranking
 Enter the following command in the terminal to set the node number (this is required even if there is only one machine):
export PET_NODE_RANK=0
ray start --head
- Configuring Search Services
- show (a ticket) ./scrl/handler/config.yamlIf you want to modify the search API key, you can do so by clicking on the "Search API" button:- Using the Serper API: fill in the serper_api_keyThe
- Use Azure Bing: fill in the azure_bing_search_subscription_keyand setsearch_enginefor Bing.
 
- Using the Serper API: fill in the 
- compiler ./scrl/handler/server_handler.pyIf you want to add a Qwen-Plus API key, add the Qwen-Plus API key:client = OpenAI( api_key="sk-xxx", base_url="xxxx" )
- Starting the Service Processor
 Runs in the terminal:
python ./scrl/handler/server_handler.py
After the service is started, the service address is recorded and updated ./scrl/handler/config.yaml hit the nail on the head server_url_listThe
- Running the main processor
 running on the training host:
python ./scrl/handler/handler.py
training model
- Execution of training scripts
 Run it in the project root directory:
bash train_grpo.sh
The training process will optimize the model based on reinforcement learning and requires patience.
Use and Reasoning
- Generating research results
 Run the evaluation script:
bash evaluate.sh
The output file is saved in the ./outputs/{project_name}/{experiment_name}/rollout/rollout_step_0.jsonThe
- View Results
 Rename the output file to{experiment_name}_result.jsonMove to./evaluate/folder and run it:
python ./evaluate/cacluate_metrics.py {experiment_name}
The score is saved in the ./evaluate/{experiment_name}_score.jsonThe
Featured Function Operation
- Automated research and cross-source validation
 After the user enters a question, DeepResearcher collects data from configured search engines (e.g. Google, Bing) and cross-validates the results. Log files./outputs/research_log.txtThe validation process will be documented.
- Self-reflective adjustments
 If the initial results are not satisfactory, the system will automatically adjust the keywords or search strategy. For example, typing "AI application in medical treatment" may change to "AI medical latest technology", and the results will be more accurate.
- Keep it honest.
 When there is no clear answer to a question, it returns something like "there is not enough information to give a definite conclusion" instead of guessing.
caveat
- Ensure that your internet connection is stable and that the search function relies on real-time data.
- Training and inference require high computational resources and GPUs are recommended.
- The project is still under development, so we recommend following the updates on GitHub.
With these steps, users can easily install and use DeepResearcher to experience its intelligent research capabilities.
application scenario
- academic research
 Researchers can use it to search for paper material, verify sources, and generate first drafts of research reports.
- Student Learning
 Students can use it to organize course-related knowledge and quickly complete assignments or project research.
- technology development
 Developers can use it to explore technology trends and get industry updates and solutions.
QA
- Does DeepResearcher support Chinese?
 Support. Users can enter questions in Chinese, and it will prioritize searching Chinese resources, and it can also handle English data.
- Need a GPU?
 Not mandatory, but the GPU can accelerate training and inference. the CPU can also run, just slower.
- How do I get the latest version?
 Run in the project directorygit pull, then reinstall the dependencies to update.































 English
English				 简体中文
简体中文					           日本語
日本語					           Deutsch
Deutsch					           Português do Brasil
Português do Brasil