DeepAnalyze is an intelligent agent large language model designed for autonomous data science. Without human intervention, users can independently perform tasks across the entire data science process, which include data preparation, analysis, modeling, visualization, and report generation.DeepAnalyze is able to drill down into a wide range of data sources, whether it is structured data such as databases, CSV, and Excel, or semi-structured data such as JSON, XML, and even unstructured text like TXT and Markdown, it can handle it all. Ultimately, it can produce analyst-level professional research reports. Most importantly, the DeepAnalyze project is completely open source, its models, code, training data and demos are publicly available, making it convenient for users to deploy in their own environment, or secondary development according to their own needs, to create an exclusive data analysis assistant.
Function List
- Full process automation: The ability to automate every step of data science, from initial data cleansing and preparation, to data analysis and modeling, to final data visualization and report generation, all without human intervention.
- Open Data Research: Not limited to specific task instructions, can conduct exploratory in-depth research on a given variety of data sources and produce high-quality research reports.
- Support for diverse data sources: Support for processing data files in multiple formats, including structured data (e.g., database, CSV, Excel), semi-structured data (e.g., JSON, XML, YAML) and unstructured data (e.g., TXT, Markdown).
- Completely open source: The model's weights, source code, training data, and an interactive demo interface are all open, allowing developers to make customizations or deploy private data analysis services.
Using Help
Below are detailed step-by-step instructions on how to install and use DeepAnalyze.
1. Environmental configuration
Before you start using it, you need to configure the software environment needed to run it. Recommended usecondato manage the environment ensures that there are no conflicts between dependent packages.
First, create a file nameddeepanalyzeof the conda environment and specify Python version 3.12.
conda create -n deepanalyze python=3.12 -y
Then, activate the environment you just created.
conda activate deepanalyze
Next, install all necessary dependencies. The root directory of the project provides arequirements.txtfile that contains all the required packages and their versions.
pip install -r requirements.txt
If model training is required, two additional development libraries need to be installed.
cd ./deepanalyze/ms-swift/ && pip install -e .
cd ./deepanalyze/SkyRL/ && pip install -e .
2. Launching the local demo interface
The project provides a demo version with a graphical user interface that allows you to interact with DeepAnalyze in a more intuitive way.
First, the entire project code needs to be cloned to your local computer.
git clone https://github.com/ruc-datalab/DeepAnalyze.git
cd DeepAnalyze
Once in the project directory, execute the startup script to run the API and front-end interface.
bash start.sh
After the script runs successfully, open the URL in your browser http://localhost:4000 It's ready to start using it. You can upload data files and then let DeepAnalyze perform data analysis tasks.
If you want to stop the service, you can run the following command:
bash stop.sh
If you wish to deploy the service under a specific IP address instead of the defaultlocalhost, you need to change the IP address in both files:./demo/backend.py cap (a poem) ./demo/chat/lib/config.tsThe
3. Interaction using the command line
For developers who prefer to use the command line, it is also possible to interact with DeepAnalyze directly through Python scripts. This approach is more flexible and facilitates automated testing and development.
First, you need to use thevllmto deployDeepAnalyze-8BModel.
vllm serve DeepAnalyze-8B
You can then use the following Python code to perform data science tasks. You can specify a specific task or you can have it perform an open-ended data study. You can provide any number and type of data sources.
from deepanalyze import DeepAnalyzeVLLM
# 定义你的指令和数据文件
# 指令可以是“生成一份数据科学报告”,也可以是更具体的任务
prompt = """# Instruction
Generate a data science report.
# Data
File 1: {"name": "bool.xlsx", "size": "4.8KB"}
File 2: {"name": "person.csv", "size": "10.6KB"}
File 3: {"name": "disabled.xlsx", "size": "5.6KB"}
File 4: {"name": "enlist.csv", "size": "6.7KB"}
File 5: {"name": "filed_for_bankrupcy.csv", "size": "1.0KB"}
File 6: {"name": "longest_absense_from_school.xlsx", "size": "16.0KB"}
File 7: {"name": "male.xlsx", "size": "8.8KB"}
File 8: {"name": "no_payment_due.xlsx", "size": "15.6KB"}
File 9: {"name": "unemployed.xlsx", "size": "5.6KB"}
File 10: {"name": "enrolled.csv", "size": "20.4KB"}"""
# 指定存放数据文件的工作区路径
workspace = "/path/to/your/data/example/student_loan/"
# 初始化模型,这里的路径是你存放DeepAnalyze-8B模型文件的路径
deepanalyze = DeepAnalyzeVLLM("/path/to/your/checkpoints/deepanalyze-8b/")
# 生成结果
answer = deepanalyze.generate(prompt, workspace=workspace)
# 打印模型的思考过程和最终报告
print(answer["reasoning"])
After running the above code, you will get a detailed research report which can be rendered directly into PDF format.
4. Deployment of API services
DeepAnalyze also supports deployment as an OpenAI-compatible API service, making it easy for you to integrate it into existing applications.
You need to run the backend service script. Before running it, make sure to modify thedemo/backend.pyin the fileMODEL_PATHvariable, set its value to yourvllmModel Name.
python demo/backend.py
Once the service is started, you can interact with the model by sending HTTP requests as if you were calling the OpenAI API.
curl -X POST http://localhost:8200/chat/completions \
-H "Content-Type: application/json" \
-d '{
"messages": [
{
"role": "user",
"content": "Generate a data science report."
}
],
"workspace": "example/student_loan/"
}'
application scenario
- Business Intelligence Analytics
 For business analysts, DeepAnalyze can be used to quickly process sales data, user behavior data, etc., and automatically generate data insight reports, eliminating tedious data processing and charting time to make faster business decisions.
- academic research
 When dealing with experimental data or social survey data, researchers can use DeepAnalyze for exploratory data analysis, hypothesis testing, and model construction, so as to discover the hidden laws behind the data and accelerate the research process.
- financial risk control
 In the financial sector, DeepAnalyze can be used to analyze credit data of loan applicants and identify potential fraud risks. It can process multiple data sources and build predictive models to inform risk assessment.
- Educational Data Mining
 Educational institutions can use DeepAnalyze to analyze student learning behavior data and performance data to understand students' learning paths and points of difficulty, thus supporting the development of personalized teaching plans.
QA
- What is DeepAnalyze?
 DeepAnalyze is the first intelligent agent large language model for autonomous data science. It can perform the complete process from data preparation to report generation independently, just like a human data scientist.
- Do I have to pay to use DeepAnalyze?
 No. DeepAnalyze is a completely open source project, its models, code and data are free for you to use and modify.
- What types of data can DeepAnalyze process?
 It can handle many types of data, including structured data stored in databases, CSV or Excel files, semi-structured data such as JSON or XML, and unstructured text data in TXT or Markdown formats.
- Can I run DeepAnalyze on my own computer?
 You can. As long as your computer meets the hardware configuration required to run the big language model, you can follow the steps provided in the official documentation to deploy and use DeepAnalyze locally.































 English
English				 简体中文
简体中文					           日本語
日本語					           Deutsch
Deutsch					           Português do Brasil
Português do Brasil