Overseas access: www.kdjingpai.com
Ctrl + D Favorites

OntoCast is an open source framework hosted on GitHub that focuses on extracting semantic triples from documents to build knowledge graphs. It combines ontology management, natural language processing, and knowledge graph serialization techniques to transform unstructured text into structured, queryable data.OntoCast employs an ontology-driven extraction approach that ensures semantic consistency, and supports a wide range of file formats such as text, JSON, PDF, and Markdown.Users can use it either natively or via REST APIs, supporting either OpenAI or native models (e.g., via Ollama). OpenAI or native models (e.g. via Ollama). Its core features are automated ontology creation, entity disambiguation and semantic chunking for scenarios where structured information needs to be extracted from complex documents. The project provides detailed documentation and Docker configuration for rapid deployment and use.

 

Function List

  • Semantic ternary extraction: extract subject-predicate-object ternary from documents to construct a knowledge graph.
  • Ontology Management: Automatically create, validate and optimize ontologies to ensure semantic consistency.
  • Entity Disambiguation: Solve the problem of cross-block entity references in documents to improve data accuracy.
  • Multi-format support: Handles multiple file formats such as text, JSON, PDF and Markdown.
  • Semantic chunking: segmenting text based on semantic similarity to optimize information extraction.
  • GraphRAG Support: Supports the generation of knowledge graph-based search enhancements to improve search capabilities.
  • MCP Compatible: Provides Model Control Protocol endpoints for easy integration and invocation.
  • Ternary storage support: supports Fuseki and Neo4j ternary storage, Fuseki is preferred.
  • Local and Cloud Deployment: Supports running locally or accessing via REST API.

 

Using Help

Installation process

OntoCast is a Python-based framework and is recommended to be deployed using Docker, here are the detailed installation and configuration steps:

  1. cloning project
    Run the following command in the terminal to clone the OntoCast project locally:

    git clone https://github.com/growgraph/ontocast.git
    cd ontocast
    
  2. Installation of dependencies
    The project uses the Python environment, which is recommended uv Tools to manage dependencies. Run the following command to install it:

    uv pip install -r requirements.txt
    

    If not uvIt can be done with pip Alternative:

    pip install -r requirements.txt
    
  3. Configuring Ternary Storage
    OntoCast supports Fuseki (recommended) and Neo4j as ternary storage backends. The following is an example of Fuseki:

    • go into docker/fuseki directory, copy and edit the environment configuration file:
      cp docker/fuseki/.env.example docker/fuseki/.env
      
    • compiler .env file to set Fuseki's URI and authentication information, for example:
      FUSEKI_URI=http://localhost:3032/test
      FUSEKI_AUTH=admin/abc123-qwe
      
    • Start the Fuseki service:
      cd docker/fuseki
      docker compose --env-file .env up -d
      
  4. Configuring Language Models
    OntoCast supports OpenAI or native models (e.g. via Ollama). Edit the project root directory's .env file that configures the model parameters:

    LLM_PROVIDER=openai
    LLM_MODEL_NAME=gpt-4o-mini
    LLM_TEMPERATURE=0.0
    OPENAI_API_KEY=your_openai_api_key_here
    

    If using a local model (e.g. Ollama), set:

    LLM_PROVIDER=ollama
    LLM_BASE_URL=http://localhost:11434
    
  5. Operational services
    Use the following command to start the OntoCast service:

    uv run serve --ontology-directory ONTOLOGY_DIR --working-directory WORKING_DIR
    

    Among them.ONTOLOGY_DIR is the ontology file storage path.WORKING_DIR is a working directory for storing processed data.

  6. Building a Docker image (optional)
    If you wish to run OntoCast using Docker, you can build an image:

    docker buildx build -t growgraph/ontocast:0.1.4 .
    

Usage

The core function of OntoCast is to extract semantic triples and build knowledge graphs, the following are the specific steps:

  1. Prepare the document
    Place the document to be processed (text, JSON, PDF or Markdown formats are supported) into the data/ Catalog. The project provides sample data that can be referenced data/ file in the directory.
  2. Run the extraction process
    OntoCast provides both a command line tool and a REST API to run:

    • command-line method
      Use the CLI tool to process documents:

      uv run ontocast process --input data/sample.md --output output.ttl
      

      This will put sample.md The file is processed into RDF triples and output to the output.ttl file (Turtle format).

    • REST API method
      After starting the service, access the /process Endpoints:

      curl -X POST http://localhost:8999/process -H "Content-Type: application/json" -d '{"input": "data/sample.md"}'
      

      The response will return the extracted ternary and ontology data.

  3. View Results
    After processing, the results are stored in a ternary store (e.g. Fuseki). The results are stored in a ternary store (e.g. Fuseki). http://localhost:3032) query the Knowledge Graph, or use the SPARQL query language to obtain data.
  4. Optimized Ontology
    OntoCast supports automatic ontology optimization. If you need to adjust the ontology manually, you can edit the data/ontologies/ ontology file in the directory and re-run the extraction process.
  5. Using GraphRAG
    OntoCast supports Knowledge Graph-based Retrieval Enhanced Generation (GraphRAG). After processing is complete, semantic search is performed using the generated knowledge graph:

    uv run ontocast search --query "特定关键词" --graph output.ttl
    

    This will return ternary results related to the keyword.

Featured Function Operation

  • semantic chunking: OntoCast automatically splits long documents into semantically similar chunks to ensure more accurate extraction of triples. Users don't need to set the chunking parameters manually, the system will process them automatically according to the semantic similarity.
  • physical discriminationOntoCast recognizes and unifies entity references when dealing with multiple or long documents. For example, "Apple" may refer to a company or a fruit in different contexts, and OntoCast will correctly categorize them according to the context.
  • Multi-format support: Users can upload PDF or Markdown files directly, and OntoCast automatically converts them to the internal processing format without additional preprocessing.
  • MCP Compatibility: By /process endpoint, OntoCast supports Model Control Protocol for easy integration with other systems.

caveat

  • Ensure that the ternary storage service (e.g. Fuseki) is working properly, otherwise the extraction results will not be saved.
  • When processing large documents, it is recommended to set the RECURSION_LIMIT cap (a poem) ESTIMATED_CHUNKS parameters to avoid performance problems.
  • The project documentation is located at docs/ catalog, which provides detailed user guides and API references.

 

application scenario

  1. academic research
    Researchers can use OntoCast to extract key concepts and relationships from academic papers to build a domain knowledge graph. For example, when dealing with biology papers, genes, proteins and their interactions can be extracted to generate a queryable knowledge base.
  2. Enterprise Document Management
    Enterprises can convert internal documents (e.g., technical manuals, contracts) into knowledge graphs for quick retrieval and analysis. For example, extracting terms, amounts and related party information from contracts improves information management efficiency.
  3. Semantic Search Optimization
    Web developers can use OntoCast to build semantic search functionality that extracts structured data from unstructured content and improves the accuracy of search results.
  4. intelligent question and answer system (Q&A)
    OntoCast can provide knowledge graph support for Q&A systems. For example, ternary groups can be extracted from a company FAQ document to answer user-specific questions about a product or service.

 

QA

  1. What file formats does OntoCast support?
    Text, JSON, PDF and Markdown formats are supported. More formats may be extended in the future.
  2. How to choose ternary storage?
    Fuseki is recommended for simpler configuration and better performance than Neo4j. refer to docker/fuseki/.env.example Configuration.
  3. Is a predefined ontology required?
    No. OntoCast automatically generates and optimizes ontologies and also supports user-supplied custom ontologies.
  4. How do you handle large documents?
    Suggested additions ESTIMATED_CHUNKS parameter (e.g., set to 50) and make sure that hardware resources are sufficient. Semantic chunking is automatically optimized for processing.
  5. What language models are supported?
    Models that support OpenAI (e.g. gpt-4o-mini) and local models (e.g., those run through Ollama).
0Bookmarked
0kudos

Recommended

Can't find AI tools? Try here!

Just type in the keyword Accessibility Bing SearchYou can quickly find all the AI tools on this site.

inbox

Contact Us

Top

en_USEnglish