Medical-RAG is a Q&A intelligence program designed for the Chinese medical field. It is based on the Retrieval Augmented Generation (RAG) technology, which improves the accuracy and security of Large Language Models (LLMs) for medical advice by incorporating external knowledge bases. The core of the project is to utilize Milvus, a high-performance vector database, to store and retrieve medical knowledge, and to integrate the LangChain framework to manage the entire Q&A process. The project implements a complete automated data processing pipeline, including intelligent data annotation using LLM, construction of medical domain-specific word lists, and efficient data warehousing. It adopts an advanced hybrid retrieval architecture that combines semantic retrieval with dense vectors and keyword retrieval with sparse vectors (BM25), and fuses multiple results with a configurable rearrangement algorithm to improve the accuracy of retrieved content. The developer can deploy and manage the whole system through flexible YAML configuration files, enabling it to adapt to different operating environments and requirements.
Function List
- Automated data processing: The project provides an automated data annotation pipeline that supports large model inference via HTTP or local GPU calls to accelerate the annotation process.
- Automated word list management:: Built-in multi-threaded and medical domain lexicon that automates the construction and management of word lists for sparse searches to improve query accuracy.
- Hybrid Search Architecture:: Supports both dense and sparse vector retrieval. The dense retrieval supports various embedding models such as Ollama, OpenAI, HuggingFace, etc., while the sparse retrieval uses the BM25 algorithm optimized for the medical field.
- Rearrangement and integration of results: Support the use of RRF (Reciprocal Rank Fusion) or weighted fusion of multiple search results to improve the relevance of the final answer.
- Deep optimization in the medical field: A predefined professional classification system containing 6 major departmental classifications and 8 major problem categories, and the use of
pkuseg
Medical Domain Segmentation Modeling for Text Processing. - High Performance Vector Database:: Based on Milvus v2.6+, supports efficient vector search, batch embedding and concurrent queries.
- Flexible configuration system:: All core parameters, such as database connection, model selection, retrieval strategy, etc., are configured through YAML files, which are easy to deploy and adjust in different environments.
- High Efficiency Interface Package: Encapsulates the common interfaces to Milvus and provides the
RAGSearchTool
Core tools, such as the developer to facilitate secondary development and call.
Using Help
The project provides a complete set of processes from the preparation of the environment to the final query, the following is a detailed step-by-step procedure designed to help users get started quickly.
Step 1: Environmental preparation
Before you start, you need to prepare the basic runtime environment, including cloning the project, installing dependencies, and starting the required services.
- Cloning Project Code
First, clone from GitHubmedical-rag
source code to your local machine.git clone https://github.com/yolo-hyl/medical-rag cd medical-rag/src
- Install project dependencies
The project is developed in Python and all dependencies are documented in thesetup.py
in. Use pip for installation.pip install -e .
- Starting the Milvus Vector Database
The project uses Milvus as a vector database and Docker is recommended to start it. A handy startup script is already included in the project code.cd Milvus bash standalone_embed.sh start ``` 此命令会启动一个单机版的Milvus实例。
- Start Ollama service (optional)
If you plan to use a locally-run big model (such as Qwen) for data annotation or generating answers, you need to install and launch Ollama.# 启动Ollama服务 ollama serve # 拉取需要用到的模型 # bge-m3是一个常用的嵌入模型,用于生成向量 ollama pull bge-m3:latest # qwen2:7b是一个性能不错的标注和问答模型 ollama pull qwen2:7b
Step 2: Basic Configuration
Before running a specific process, the core parameters need to be configured. The configuration file is located in thesrc/MedicalRag/config/default.yaml
. You need to modify the following key information according to your environment:
- Milvus connection information: To ensure that
uri
cap (a poem)token
Match the Milvus instance you started.milvus: client: uri: "http://localhost:19530" token: "root:Milvus" collection: name: "qa_knowledge"
- Embedded Model Configuration: Specifies the model to be used to generate dense vectors. The following configuration uses the local Ollama service in the
bge-m3
Model.embedding: dense: provider: ollama model: "bge-m3:latest" base_url: "http://localhost:11434"
Step 3: Data Processing and Inventory
Data processing is the core of building a Q&A system, and the project divides it into four segments: data annotation, constructing word lists, creating collections, and data entry.
- data annotation
This step utilizes a large language model to automatically categorize the raw Q&A data (e.g., department affiliation, question type).- First, configure the labeling parameter file:
src/MedicalRag/config/data/annotator.yaml
The - Then, run the annotation script:
python scripts/annotation.py src/MedicalRag/config/data/annotator.yaml
- First, configure the labeling parameter file:
- construct a word list
In order to support BM25 sparse retrieval, a proprietary vocabulary needs to be constructed based on the corpus of the medical domain.python scripts/build_vocab.py
The script processes the data and generates a file named
vocab.pkl.gz
of the word list file. - Creating a Milvus collection (Collection)
This step creates a collection in Milvus for storing vectors and related information. The structure (Schema) of the collection is given by thedefault.yaml
Configuration file definition.# 使用默认配置文件创建集合 python scripts/create_collection.py -c src/MedicalRag/config/default.yaml # 如果需要强制删除并重建集合,可以添加--force-recreate参数 python scripts/create_collection.py --force-recreate
- Data entry
The processed and labeled data is vectorized and finally deposited into the Milvus collection.python scripts/insert_data_to_collection.py
This script automatically processes the vectorization of the data (both dense and sparse vectors) and batch inserts it into the database.
Step 4: Search and retrieval
Once the data is all in, the Q&A search can begin.
- Configuring Query Policies
You can do this by modifying thesrc/MedicalRag/config/search/search_answer.yaml
file to define the retrieval strategy, e.g., to adjust the weights of different retrieval channels (dense, sparse). - Run the query script
utilizationsearch_pipline.py
script to execute the query.# 使用指定的搜索配置文件进行查询 python scripts/search_pipline.py --search-config src/MedicalRag/config/search/search_answer.yaml
The script goes into an interactive mode where you can enter a question (e.g. "What are the symptoms of syphilis?") to test the search.
Use of core tools
The project also provides a program calledRAGSearchTool
tool class to facilitate direct calls to the retrieval function in other code.
from MedicalRag.tools.rag_search_tool import RAGSearchTool
# 从配置文件初始化工具
tool = RAGSearchTool("config/search.yaml")
if tool.is_ready():
# 执行单个查询
results = tool.search("梅毒的症状有哪些?")
print(results)
# 执行批量查询
results_batch = tool.search(["梅毒的治疗方法", "高血压的预防措施"])
print(results_batch)
# 带过滤条件的查询(例如,只在“外科”相关的知识中检索)
results_filtered = tool.search("骨折怎么办", filters={"dept_pk": "3"}) # 假设3代表外科
print(results_filtered)
application scenario
- Intelligent diagnosis and treatment assistant
The system can be used as a clinical aid for doctors. When doctors encounter complex or rare cases, they can quickly query relevant diagnosis and treatment guidelines, drug information and the latest medical research to provide decision support for diagnosis and treatment. - Medical student education and training
It can be used to build a simulated questioning system to help medical students practice asking questions, diagnosing and developing treatment plans in a virtual environment. The system can accelerate the learning process by providing standardized answers and relevant knowledge points based on students' questions. - Patient Health Counseling
It can be deployed as a public-facing intelligent customer service or chatbot to provide initial health counseling services to patients 24/7. Users can ask questions about common diseases, symptoms, medication precautions, etc., and the system can provide secure and accurate answers from an authoritative knowledge base, easing the pressure on hospital outpatient services. - Medical Knowledge Base Management and Retrieval
For hospitals and research organizations, the system can integrate massive internal medical documents, medical records and research papers to build an intelligent knowledge management platform. Researchers and healthcare professionals can find the information they need quickly and precisely through natural language.
QA
- What problem does this program solve?
It mainly solves the problem that general-purpose large language models have insufficient knowledge in specialized domains (especially in the medical field) and are prone to "hallucinate" or provide inaccurate information. Through RAG technology, model responses are limited to a reliable external medical knowledge base, thus providing more accurate and safe medical advice. - What are the key technologies used in the project?
The project mainly uses Retrieval Augmented Generation (RAG), vector databases (Milvus), natural language processing frameworks (LangChain), hybrid retrieval techniques (combination of dense and sparse vectors BM25), and a variety of optional back-ends for large language modeling (e.g., Ollama, OpenAI, etc.). - How do I replace the embedding model or language model used in my project?
Changing models is as simple as modifying the corresponding YAML configuration file. For example, to change the dense embedding model, you can change the model in thedefault.yaml
modificationsembedding.dense
share ofprovider
cap (a poem)model
fields. Similarly, the LLM used for data annotation can be found in theannotator.yaml
Configure it in. - How should I optimize if the retrieval is not satisfactory?
There are several ways to optimize. First, try tweaking thesearch_answer.yaml
configuration file for different retrieval channels in theweight
(weights) to change the fusion ratio of dense and sparse retrieval results . Second, the data used to construct the word list can be examined and expanded to generate a higher qualityvocab.pkl.gz
documents to improve the accuracy of sparse retrieval. Finally, ensuring that your knowledge base data is of high quality and has broad coverage is fundamental to improving results.