Overseas access: www.kdjingpai.com
Bookmark Us

OpenMed is an open source AI modeling platform dedicated to healthcare and life sciences, hosted on Hugging Face.It offers over 380 free Named Entity Recognition (NER) models focused on extracting key information such as drugs, diseases, genes, and anatomical structures from clinical texts and research literature. These models are all based on the Apache 2.0 license and can be freely used by anyone.OpenMed's goal is to break down the high-cost barriers to healthcare AI and give researchers, physicians, and developers easy access to high-quality tools that accelerate healthcare research and improve patient services. The platform's models have excellent performance, outperforming even expensive commercial models on multiple datasets, with up to 36% of lift. openMed emphasizes openness and community collaboration, and welcomes contributions and use from users around the world.

 

Function List

  • Provides more than 380 Named Entity Recognition (NER) models covering drugs, diseases, genes, anatomical structures, tumors, and other medical fields.
  • Supports the extraction of specific entities, such as chemical substances, genetic variants, pathological information, etc., from clinical records and research papers.
  • The models come in a variety of sizes, from 65M to 568M parameters, and are adapted to different hardware environments (e.g., 8GB to 40GB GPUs).
  • Seamless integration with the Hugging Face Transformers ecosystem for easy loading and deployment.
  • Provides a model discovery application that allows users to filter models by domain (e.g., pharmacology, oncology) or entity type.
  • All models are open source, based on the Apache 2.0 license, and free for use in research and production environments.
  • Supports batch processing of medical text data to optimize the efficiency of large-scale data analysis.

Using Help

Installation and environment preparation

OpenMed's models are hosted on the Hugging Face platform and the necessary software environment needs to be installed before use. Below are the detailed steps:

  1. Installing the Python Environment: Make sure that Python 3.7 or later is installed on your system. This can be checked with the following command:
    python --version
    

    If you don't have it, you can download it from the Python website.

  2. Install Hugging Face Transformers: The OpenMed model runs on the Transformers framework, which needs to be installed. Open a terminal and type:
    pip install transformers datasets pandas
    

    This will install Transformers, Datasets and Pandas for model loading and data processing.

  3. Verify GPU support (optional): If using GPU acceleration, install PyTorch or TensorFlow and make sure the GPU driver and CUDA are configured. Check GPU availability:
    python -c "import torch; print(torch.cuda.is_available())"
    

    exports True Indicates that the GPU is available.

Basic usage flow

The core functionality of OpenMed is Named Entity Recognition (NER) for extracting structured information from medical text. Below is an example of OpenMed/OpenMed-NER-PharmaDetect-SuperClinical-434M model as an example to show how to load and use the model:

  1. Loading Models::
    Using Hugging Face's pipeline Interface to load a model. The following code loads a drug recognition model:

    from transformers import pipeline
    model_name = "OpenMed/OpenMed-NER-PharmaDetect-SuperClinical-434M"
    ner_pipeline = pipeline("token-classification", model=model_name, aggregation_strategy="simple")
    
    • model_name: Specify a model name to find other models on the OpenMed Models page.
    • aggregation_strategy="simple": Aggregate the results into full entities, see the Hugging Face documentation for details.
  2. Handling of single text::
    Input medical text and the model will recognize the entities in it. Example:

    text = "患者服用10毫克阿司匹林治疗高血压。"
    entities = ner_pipeline(text)
    for entity in entities:
    print(f"实体: {entity['word']} ({entity['entity_group']}), 置信度: {entity['score']:.4f}")
    

    Example output:

    实体: 阿司匹林 (CHEMICAL), 置信度: 0.9987
    

    This means that the model successfully recognizes "Aspirin" as a chemical entity.

  3. Batch processing of text::
    For large amounts of text, OpenMed supports batch processing for efficiency. The following code shows how to process multiple texts:

    texts = [
    "患者服用10毫克阿司匹林治疗高血压。",
    "多柔比星治疗显示肿瘤显著消退。",
    "研究发现甲氨蝶呤对类风湿性关节炎有效。"
    ]
    results = ner_pipeline(texts, batch_size=8)
    for i, entities in enumerate(results):
    print(f"文本 {i+1} 实体:")
    for entity in entities:
    print(f" - {entity['word']} ({entity['entity_group']}): {entity['score']:.4f}")
    

    Example output:

    文本 1 实体:
    - 阿司匹林 (CHEMICAL): 0.9987
    文本 2 实体:
    - 多柔比星 (CHEMICAL): 0.9972
    文本 3 实体:
    - 甲氨蝶呤 (CHEMICAL): 0.9965
    
    • batch_size=8: Adjusts the batch size according to hardware performance, and reduces the value when GPU memory is small.
  4. Batch Processing with Data Sets::
    OpenMed supports processing of Hugging Face datasets. The following code shows how to load the OpenMed dataset and process it:

    from datasets import load_dataset
    from transformers.pipelines.pt_utils import KeyDataset
    import pandas as pd
    # 加载医疗数据集
    medical_dataset = load_dataset("BI55/MedText", split="train[:100]")
    data = pd.DataFrame({"text": medical_dataset["Completion"]})
    dataset = Dataset.from_pandas(data)
    # 批量处理
    batch_size = 16
    results = []
    for out in ner_pipeline(KeyDataset(dataset, "text"), batch_size=batch_size):
    results.extend(out)
    print(f"已处理 {len(results)} 条文本")
    
  5. Using Models to Discover Applications::
    OpenMed provides an interactive model discovery application at OpenMed NER Model Discovery App. users can use it in the following ways:

    • Open the application page and enter the type of entity to be recognized (e.g. "chemical substance" or "gene").
    • Use the filter function to find suitable models by domain (e.g., pharmacology, oncology) or model architecture (BERT, RoBERTa).
    • Click on the model link to get the model name and code example directly, copy it to run locally.

Featured Function Operation

  • Multi-disciplinary support: OpenMed models cover a wide range of fields such as pharmacology, oncology, genomics, and pathology. For example, the use of OpenMed/OpenMed-NER-OncologyDetect-SuperClinical-434M Identify cancer-related entities:
    text = "KRAS基因突变驱动肿瘤形成。"
    entities = ner_pipeline(text)
    print(entities)
    

    Example output:

    [{'word': 'KRAS', 'entity_group': 'GENE', 'score': 0.9991}]
    
  • Efficient integrationThe model is ecologically compatible with Hugging Face and supports rapid deployment to production environments. Users can deploy models via Hugging Face Inference Endpoints without the need for local hardware.
  • Community Contributions: Users can follow OpenMed through Hugging Face's "Watch" feature to submit feature requests or contribute new models.

caveat

  • Ensure a stable internet connection to download the model weights (some models are large, e.g. 568M parametric models require about 40GB of storage).
  • When GPU memory is insufficient, it is recommended to choose a smaller model (e.g. 65M parameters for the OpenMed-NER-PathologyDetect-TinyMed-65M).
  • Check the OpenMed page regularly for the latest model updates.

application scenario

  1. Clinical record analysis
    Hospitals can use OpenMed models to extract information about medications, diseases, etc. from patient records. For example, quickly recognizing the name of the drug in "Patient Took Aspirin" assists physicians in organizing electronic medical records.
  2. medical research
    Researchers can use the model to analyze the literature, extract genes, proteins and other information to build a knowledge graph. For example, the association between the BRCA2 gene and cancer can be extracted from a paper.
  3. drug development
    Pharmaceutical companies can use models to identify chemical-drug interactions and accelerate drug discovery. For example, to analyze the role of doxorubicin in the treatment of tumors.
  4. Patient Privacy
    De-identification through the NER model automatically removes personal information (e.g., name, address) from patient records and complies with privacy regulations such as HIPAA.

QA

  1. Are OpenMed models free?
    Yes, all models are based on the Apache 2.0 license and are completely free for research and commercial use.
  2. How to choose the right model?
    Use the OpenMed NER Model Discovery App to filter models by domain or entity type. Parameter sizes (e.g. 65M or 434M) can also be selected based on hardware conditions.
  3. What hardware is needed to run the model?
    Models support 8GB to 40GB GPUs, CPUs can also run smaller models, but they are slower. At least 16GB of RAM is recommended.
  4. How do you handle large-scale datasets?
    Use the batch processing code to adjust the batch_size parameters to adapt to the hardware. Refer to the batch processing example in the Help.
0Bookmarked
0kudos

Recommended

Can't find AI tools? Try here!

Just type in the keyword Accessibility Bing SearchYou can quickly find all the AI tools on this site.

inbox

Contact Us

Top

en_USEnglish