Current Position:fig. beginning " AI Tool

OntoCast: an intelligent framework for extracting semantic triples from documents

2025-07-03

380 2

OntoCast is an open source framework hosted on GitHub that focuses on extracting semantic triples from documents to build knowledge graphs. It combines ontology management, natural language processing, and knowledge graph serialization techniques to transform unstructured text into structured, queryable data.OntoCast employs an ontology-driven extraction approach that ensures semantic consistency, and supports a wide range of file formats such as text, JSON, PDF, and Markdown.Users can use it either natively or via REST APIs, supporting either OpenAI or native models (e.g., via Ollama). OpenAI or native models (e.g. via Ollama). Its core features are automated ontology creation, entity disambiguation and semantic chunking for scenarios where structured information needs to be extracted from complex documents. The project provides detailed documentation and Docker configuration for rapid deployment and use.

Function List

Semantic ternary extraction: extract subject-predicate-object ternary from documents to construct a knowledge graph.
Ontology Management: Automatically create, validate and optimize ontologies to ensure semantic consistency.
Entity Disambiguation: Solve the problem of cross-block entity references in documents to improve data accuracy.
Multi-format support: Handles multiple file formats such as text, JSON, PDF and Markdown.
Semantic chunking: segmenting text based on semantic similarity to optimize information extraction.
GraphRAG Support: Supports the generation of knowledge graph-based search enhancements to improve search capabilities.
MCP Compatible: Provides Model Control Protocol endpoints for easy integration and invocation.
Ternary storage support: supports Fuseki and Neo4j ternary storage, Fuseki is preferred.
Local and Cloud Deployment: Supports running locally or accessing via REST API.

Using Help

Installation process

OntoCast is a Python-based framework and is recommended to be deployed using Docker, here are the detailed installation and configuration steps:

cloning project
Run the following command in the terminal to clone the OntoCast project locally:
```
git clone https://github.com/growgraph/ontocast.git
cd ontocast
```
Installation of dependencies
The project uses the Python environment, which is recommended uv Tools to manage dependencies. Run the following command to install it:
```
uv pip install -r requirements.txt
```
If not uvIt can be done with pip Alternative:
```
pip install -r requirements.txt
```
Configuring Ternary Storage
OntoCast supports Fuseki (recommended) and Neo4j as ternary storage backends. The following is an example of Fuseki:
- go into docker/fuseki directory, copy and edit the environment configuration file:
```
cp docker/fuseki/.env.example docker/fuseki/.env
```
- compiler .env file to set Fuseki's URI and authentication information, for example:
```
FUSEKI_URI=http://localhost:3032/test
FUSEKI_AUTH=admin/abc123-qwe
```
- Start the Fuseki service:
```
cd docker/fuseki
docker compose --env-file .env up -d
```
Configuring Language Models
OntoCast supports OpenAI or native models (e.g. via Ollama). Edit the project root directory's .env file that configures the model parameters:
```
LLM_PROVIDER=openai
LLM_MODEL_NAME=gpt-4o-mini
LLM_TEMPERATURE=0.0
OPENAI_API_KEY=your_openai_api_key_here
```
If using a local model (e.g. Ollama), set:
```
LLM_PROVIDER=ollama
LLM_BASE_URL=http://localhost:11434
```
Operational services
Use the following command to start the OntoCast service:
```
uv run serve --ontology-directory ONTOLOGY_DIR --working-directory WORKING_DIR
```
Among them.ONTOLOGY_DIR is the ontology file storage path.WORKING_DIR is a working directory for storing processed data.
Building a Docker image (optional)
If you wish to run OntoCast using Docker, you can build an image:
```
docker buildx build -t growgraph/ontocast:0.1.4 .
```

Usage

The core function of OntoCast is to extract semantic triples and build knowledge graphs, the following are the specific steps:

Prepare the document
Place the document to be processed (text, JSON, PDF or Markdown formats are supported) into the data/ Catalog. The project provides sample data that can be referenced data/ file in the directory.
Run the extraction process
OntoCast provides both a command line tool and a REST API to run:
- command-line method
  Use the CLI tool to process documents:
```
uv run ontocast process --input data/sample.md --output output.ttl
```
  This will put sample.md The file is processed into RDF triples and output to the output.ttl file (Turtle format).
- REST API method
  After starting the service, access the /process Endpoints:
```
curl -X POST http://localhost:8999/process -H "Content-Type: application/json" -d '{"input": "data/sample.md"}'
```
  The response will return the extracted ternary and ontology data.
View Results
After processing, the results are stored in a ternary store (e.g. Fuseki). The results are stored in a ternary store (e.g. Fuseki). http://localhost:3032) query the Knowledge Graph, or use the SPARQL query language to obtain data.
Optimized Ontology
OntoCast supports automatic ontology optimization. If you need to adjust the ontology manually, you can edit the data/ontologies/ ontology file in the directory and re-run the extraction process.
Using GraphRAG
OntoCast supports Knowledge Graph-based Retrieval Enhanced Generation (GraphRAG). After processing is complete, semantic search is performed using the generated knowledge graph:
```
uv run ontocast search --query "特定关键词" --graph output.ttl
```
This will return ternary results related to the keyword.

Featured Function Operation

semantic chunking: OntoCast automatically splits long documents into semantically similar chunks to ensure more accurate extraction of triples. Users don't need to set the chunking parameters manually, the system will process them automatically according to the semantic similarity.
physical discriminationOntoCast recognizes and unifies entity references when dealing with multiple or long documents. For example, "Apple" may refer to a company or a fruit in different contexts, and OntoCast will correctly categorize them according to the context.
Multi-format support: Users can upload PDF or Markdown files directly, and OntoCast automatically converts them to the internal processing format without additional preprocessing.
MCP Compatibility: By /process endpoint, OntoCast supports Model Control Protocol for easy integration with other systems.

caveat

Ensure that the ternary storage service (e.g. Fuseki) is working properly, otherwise the extraction results will not be saved.
When processing large documents, it is recommended to set the RECURSION_LIMIT cap (a poem) ESTIMATED_CHUNKS parameters to avoid performance problems.
The project documentation is located at docs/ catalog, which provides detailed user guides and API references.

application scenario

academic research
Researchers can use OntoCast to extract key concepts and relationships from academic papers to build a domain knowledge graph. For example, when dealing with biology papers, genes, proteins and their interactions can be extracted to generate a queryable knowledge base.
Enterprise Document Management
Enterprises can convert internal documents (e.g., technical manuals, contracts) into knowledge graphs for quick retrieval and analysis. For example, extracting terms, amounts and related party information from contracts improves information management efficiency.
Semantic Search Optimization
Web developers can use OntoCast to build semantic search functionality that extracts structured data from unstructured content and improves the accuracy of search results.
intelligent question and answer system (Q&A)
OntoCast can provide knowledge graph support for Q&A systems. For example, ternary groups can be extracted from a company FAQ document to answer user-specific questions about a product or service.

QA

What file formats does OntoCast support?
Text, JSON, PDF and Markdown formats are supported. More formats may be extended in the future.
How to choose ternary storage?
Fuseki is recommended for simpler configuration and better performance than Neo4j. refer to docker/fuseki/.env.example Configuration.
Is a predefined ontology required?
No. OntoCast automatically generates and optimizes ontologies and also supports user-supplied custom ontologies.
How do you handle large documents?
Suggested additions ESTIMATED_CHUNKS parameter (e.g., set to 50) and make sure that hardware resources are sufficient. Semantic chunking is automatically optimized for processing.
What language models are supported?
Models that support OpenAI (e.g. gpt-4o-mini) and local models (e.g., through Ollama (running model).

AI open source project knowledge map

AI productivity tools " OntoCast: an intelligent framework for extracting semantic triples from documents Posted on 2025-07-03, if you find the URL is out of date, or inaccessible, please contact us.

0Bookmarked

0kudos

OntoCast: an intelligent framework for extracting semantic triples from documents

Function List

Using Help

Installation process

Usage

Featured Function Operation

caveat

application scenario

QA

Related articles

Recommended

Can't find AI tools? Try here!

Popular AI tools

New Releases

Latest AI tools

OntoCast: an intelligent framework for extracting semantic triples from documents

Function List

Using Help

Installation process

Usage

Featured Function Operation

caveat

application scenario

QA

Related articles

Recommended

Can't find AI tools? Try here!

Popular AI tools

New Releases

Latest AI tools

Quick query station AI tool