OpenMed is an open source AI modeling platform dedicated to healthcare and life sciences, hosted on Hugging Face.It offers over 380 free Named Entity Recognition (NER) models focused on extracting key information such as drugs, diseases, genes, and anatomical structures from clinical texts and research literature. These models are all based on the Apache 2.0 license and can be freely used by anyone.OpenMed's goal is to break down the high-cost barriers to healthcare AI and give researchers, physicians, and developers easy access to high-quality tools that accelerate healthcare research and improve patient services. The platform's models have excellent performance, outperforming even expensive commercial models on multiple datasets, with up to 36% of lift. openMed emphasizes openness and community collaboration, and welcomes contributions and use from users around the world.
Function List
- Provides more than 380 Named Entity Recognition (NER) models covering drugs, diseases, genes, anatomical structures, tumors, and other medical fields.
- Supports the extraction of specific entities, such as chemical substances, genetic variants, pathological information, etc., from clinical records and research papers.
- The models come in a variety of sizes, from 65M to 568M parameters, and are adapted to different hardware environments (e.g., 8GB to 40GB GPUs).
- Seamless integration with the Hugging Face Transformers ecosystem for easy loading and deployment.
- Provides a model discovery application that allows users to filter models by domain (e.g., pharmacology, oncology) or entity type.
- All models are open source, based on the Apache 2.0 license, and free for use in research and production environments.
- Supports batch processing of medical text data to optimize the efficiency of large-scale data analysis.
Using Help
Installation and environment preparation
OpenMed's models are hosted on the Hugging Face platform and the necessary software environment needs to be installed before use. Below are the detailed steps:
- Installing the Python Environment: Make sure that Python 3.7 or later is installed on your system. This can be checked with the following command:
python --version
If you don't have it, you can download it from the Python website.
- Install Hugging Face Transformers: The OpenMed model runs on the Transformers framework, which needs to be installed. Open a terminal and type:
pip install transformers datasets pandas
This will install Transformers, Datasets and Pandas for model loading and data processing.
- Verify GPU support (optional): If using GPU acceleration, install PyTorch or TensorFlow and make sure the GPU driver and CUDA are configured. Check GPU availability:
python -c "import torch; print(torch.cuda.is_available())"
exports
True
Indicates that the GPU is available.
Basic usage flow
The core functionality of OpenMed is Named Entity Recognition (NER) for extracting structured information from medical text. Below is an example of OpenMed/OpenMed-NER-PharmaDetect-SuperClinical-434M
model as an example to show how to load and use the model:
- Loading Models::
Using Hugging Face'spipeline
Interface to load a model. The following code loads a drug recognition model:from transformers import pipeline model_name = "OpenMed/OpenMed-NER-PharmaDetect-SuperClinical-434M" ner_pipeline = pipeline("token-classification", model=model_name, aggregation_strategy="simple")
model_name
: Specify a model name to find other models on the OpenMed Models page.aggregation_strategy="simple"
: Aggregate the results into full entities, see the Hugging Face documentation for details.
- Handling of single text::
Input medical text and the model will recognize the entities in it. Example:text = "患者服用10毫克阿司匹林治疗高血压。" entities = ner_pipeline(text) for entity in entities: print(f"实体: {entity['word']} ({entity['entity_group']}), 置信度: {entity['score']:.4f}")
Example output:
实体: 阿司匹林 (CHEMICAL), 置信度: 0.9987
This means that the model successfully recognizes "Aspirin" as a chemical entity.
- Batch processing of text::
For large amounts of text, OpenMed supports batch processing for efficiency. The following code shows how to process multiple texts:texts = [ "患者服用10毫克阿司匹林治疗高血压。", "多柔比星治疗显示肿瘤显著消退。", "研究发现甲氨蝶呤对类风湿性关节炎有效。" ] results = ner_pipeline(texts, batch_size=8) for i, entities in enumerate(results): print(f"文本 {i+1} 实体:") for entity in entities: print(f" - {entity['word']} ({entity['entity_group']}): {entity['score']:.4f}")
Example output:
文本 1 实体: - 阿司匹林 (CHEMICAL): 0.9987 文本 2 实体: - 多柔比星 (CHEMICAL): 0.9972 文本 3 实体: - 甲氨蝶呤 (CHEMICAL): 0.9965
batch_size=8
: Adjusts the batch size according to hardware performance, and reduces the value when GPU memory is small.
- Batch Processing with Data Sets::
OpenMed supports processing of Hugging Face datasets. The following code shows how to load the OpenMed dataset and process it:from datasets import load_dataset from transformers.pipelines.pt_utils import KeyDataset import pandas as pd # 加载医疗数据集 medical_dataset = load_dataset("BI55/MedText", split="train[:100]") data = pd.DataFrame({"text": medical_dataset["Completion"]}) dataset = Dataset.from_pandas(data) # 批量处理 batch_size = 16 results = [] for out in ner_pipeline(KeyDataset(dataset, "text"), batch_size=batch_size): results.extend(out) print(f"已处理 {len(results)} 条文本")
- Using Models to Discover Applications::
OpenMed provides an interactive model discovery application at OpenMed NER Model Discovery App. users can use it in the following ways:- Open the application page and enter the type of entity to be recognized (e.g. "chemical substance" or "gene").
- Use the filter function to find suitable models by domain (e.g., pharmacology, oncology) or model architecture (BERT, RoBERTa).
- Click on the model link to get the model name and code example directly, copy it to run locally.
Featured Function Operation
- Multi-disciplinary support: OpenMed models cover a wide range of fields such as pharmacology, oncology, genomics, and pathology. For example, the use of
OpenMed/OpenMed-NER-OncologyDetect-SuperClinical-434M
Identify cancer-related entities:text = "KRAS基因突变驱动肿瘤形成。" entities = ner_pipeline(text) print(entities)
Example output:
[{'word': 'KRAS', 'entity_group': 'GENE', 'score': 0.9991}]
- Efficient integrationThe model is ecologically compatible with Hugging Face and supports rapid deployment to production environments. Users can deploy models via Hugging Face Inference Endpoints without the need for local hardware.
- Community Contributions: Users can follow OpenMed through Hugging Face's "Watch" feature to submit feature requests or contribute new models.
caveat
- Ensure a stable internet connection to download the model weights (some models are large, e.g. 568M parametric models require about 40GB of storage).
- When GPU memory is insufficient, it is recommended to choose a smaller model (e.g. 65M parameters for the
OpenMed-NER-PathologyDetect-TinyMed-65M
). - Check the OpenMed page regularly for the latest model updates.
application scenario
- Clinical record analysis
Hospitals can use OpenMed models to extract information about medications, diseases, etc. from patient records. For example, quickly recognizing the name of the drug in "Patient Took Aspirin" assists physicians in organizing electronic medical records. - medical research
Researchers can use the model to analyze the literature, extract genes, proteins and other information to build a knowledge graph. For example, the association between the BRCA2 gene and cancer can be extracted from a paper. - drug development
Pharmaceutical companies can use models to identify chemical-drug interactions and accelerate drug discovery. For example, to analyze the role of doxorubicin in the treatment of tumors. - Patient Privacy
De-identification through the NER model automatically removes personal information (e.g., name, address) from patient records and complies with privacy regulations such as HIPAA.
QA
- Are OpenMed models free?
Yes, all models are based on the Apache 2.0 license and are completely free for research and commercial use. - How to choose the right model?
Use the OpenMed NER Model Discovery App to filter models by domain or entity type. Parameter sizes (e.g. 65M or 434M) can also be selected based on hardware conditions. - What hardware is needed to run the model?
Models support 8GB to 40GB GPUs, CPUs can also run smaller models, but they are slower. At least 16GB of RAM is recommended. - How do you handle large-scale datasets?
Use the batch processing code to adjust thebatch_size
parameters to adapt to the hardware. Refer to the batch processing example in the Help.