Current Position:fig. beginning " AI Tool

TEN: An open source tool for building real-time multimodal speech AI intelligences

2025-07-30

AI Tool/AI audio/AI Workforce/voice interaction

921 9

https://github.com/TEN-framework/ten-framework

make a copy of

TEN Framework is an open source software platform focused on helping developers build real-time, multimodal, low-latency speech AI intelligences. It supports multiple programming languages, including C, C++, Go, Python, JavaScript, and TypeScript, and allows developers to quickly create intelligences with voice, vision, and text interaction capabilities. The framework provides a modular extension system that seamlessly integrates with external platforms such as Dify and Coze. it also supports deployment in the cloud and on edge devices, making it suitable for a wide range of application scenarios. the TEN framework is released under the Apache 2.0 license, which encourages open source collaboration, and developers are free to participate in code contribution, documentation improvement, or feature development. Official documentation and blogs provide detailed guidance for beginners and professional developers.

Function List

Support real-time voice interaction: realize full-duplex dialog, support real-time voice recognition and text-to-speech.
Multimodal support: combining speech, vision and text processing to build integrated AI intelligences.
Modular Extension System: Provides reusable extensions to easily integrate external tools such as weather query, web search, etc.
Cross-platform operation: supports Windows, Mac, Linux and mobile devices, compatible with edge devices such as ESP32.
Workflow building tool: Simplify smart body development by providing a low-code/no-code interface through TMAN Designer.
Integrate mainstream big models: support Llama 4, Google Gemini, DeepSeek R1, etc., providing real-time interaction capability.
Real-time image generation: Support for generating story-related images to enhance the interactive experience through the StoryTeller extension.
Open Source Collaboration Support: Provides GitHub Issues and Projects for developers to contribute code or give feedback on issues.

Using Help

Installation process

The installation process for the TEN Framework varies depending on the target platform and development requirements. The following generic installation steps, based on official documentation, are suitable for most users:

environmental preparation
- Make sure that the necessary development tools are installed on your system. For example, C/C++ development requires the installation of a compiler (e.g. GCC), and Python development requires the Python 3.8+ environment.
- Install Git to clone the TEN Framework code repository. Run the following command to clone the repository:
```
git clone https://github.com/TEN-framework/ten-framework.git
```
- Go to the project catalog:
```
cd ten-framework
```
Installation of dependencies
- The TEN framework relies on a number of third-party libraries, which are listed in each package'sLICENSEfile. Run the following command to install the basic dependencies:
```
pip install -r requirements.txt
```
- For C/C++ components, the TEN framework uses a build system based on Google GN. The GN tools need to be installed, as described in the following stepsten_gnsubmodule of theREADME.md::
```
git submodule update --init --recursive
cd core/ten_gn
./configure
```
Configuring external services
- The TEN framework supports integration with external APIs such as Deepgram (speech recognition), Elevenlabs (text-to-speech) and OpenAI (big models). Users are required to register for these services and obtain an API key.
- Create a configuration file in the project root directory (e.g.config.json), fill in the API key:
```
{
"agora_app_id": "<your_agora_app_id>",
"openai_api_key": "<your_openai_api_key>",
"deepgram_api_key": "<your_deepgram_api_key>",
"elevenlabs_api_key": "<your_elevenlabs_api_key>"
}
```
- These keys can be obtained through a free trial on each platform, as described in the documentation for TEN Portal.
Run Playground
- TEN provides a Playground example for a quick experience with the framework functionality. Run the following command to start it:
```
python playground.py
```
- Playground supports interaction with TEN Agent to demonstrate real-time voice dialog and image generation capabilities.

Functional operation flow

real time voice interaction

The TEN framework enables real-time voice interaction through the TEN Agent. Users can experience it through the following steps:

After launching the TEN Agent, select DeepSeek R1 or Google Gemini as the language model.
Using the microphone to input speech, the system converts the speech to text in real time and generates a response using a large model.
Answers will be output as speech via Elevenlabs' text-to-speech feature.
Example action: Say "Tell a story about an adventure in the forest" and TEN Agent will generate the story and the related image via StoryTeller extension.

Workflow construction (TMAN Designer)

TMAN Designer is a low-code tool for quickly building AI intelligences:

Access to the TMAN Designer web interface (to be run locally or to access the officially available online version).
Drag and drop modules in the interface to create a voice interaction flow. For example, add the "Speech Input" module to connect to the "OpenAI Processing" module, and then connect to the "Speech Output" module.
After saving the workflow, click the "Run" button to test the interaction of the smart body.
TMAN Designer supports dark/light theme switching, built-in editor and log viewer for easy debugging.

Extended Integration

The modular design of the TEN framework allows developers to add custom extensions. For example, integrating a weather lookup function:

Download the Weather Check extension and install it into the TEN Framework'sextensionsCatalog.
Add the Weather Check module to the workflow and configure the API key (e.g. OpenWeatherMap).
Test Procedure: Input "How is the weather in Beijing today", the system will return real-time weather information.

Hardware support (ESP32)

TEN Agent is supported on the ESP32-S3 Korvo V3 development board:

clone (loanword)TEN-Agent/esp32-clientBranching out.
Compile and flash the firmware using the ESP-IDF toolchain, refer to theesp32-client/README.mdThe
Once the Wi-Fi and API keys are configured, the ESP32 device can run the TEN Agent to support real-time voice interaction.

caveat

Ensure that the network connection is stable, as some of the functions rely on the cloud API.
Regularly check for updates on GitHub by running thegit pullGet the latest version.
If you encounter problems, submit feedback at GitHub Issues or join the Discord discussion in the TEN community (link at TEN Portal).

application scenario

Educational aids
TEN Agent can be used to create interactive learning assistants. Students ask questions by voice, and the intelligent body answers in real time and generates relevant images. For example, if a student asks "What is a volcano?", the TEN Agent will explain the principles of volcano formation and generate an image of an erupting volcano to increase learning interest.
Intelligent Customer Service System
Organizations can use the TEN framework to build real-time voice customer service that supports multilingual interactions. Customer service intelligences can handle common questions, such as order inquiries or technical support, and find up-to-date information through Web Search extensions.
IoT device control
In a smart home scenario, the TEN Agent runs on the ESP32 device and allows the user to control appliances by voice. For example, say "turn on the living room light" and the smart body will parse the command and send a control signal.
Children's Story Generation
Parents can use the StoryTeller extension to allow TEN Agent to generate personalized stories for their children and generate illustrations in real time to enhance the immersive experience.

QA

Is the TEN framework free?
The TEN Framework is completely open source and is released under the Apache 2.0 license. Users can download and use it for free, but some features require third-party API keys, which may involve costs.
Do I need programming experience to use the TEN Framework?
Not necessarily. TMAN Designer provides a low-code interface for users with no programming experience. Developers can support multiple programming languages through code customization features.
What big models does TEN Agent support?
Models from Llama 4, Google Gemini, DeepSeek R1, and OpenAI are currently supported, and more model support will be extended in the future.
How to deploy TEN Agent on edge devices?
To run TEN Agent on devices such as ESP32, you need to install the ESP-IDF toolchain and configure the firmware. For specific steps, please refer toTEN-Agent/esp32-clientDocumentation.

AI open source project Multimodal real-time interactive products

AI productivity tools " TEN: An open source tool for building real-time multimodal speech AI intelligences Published on 2025-07-30, if you find the URL is out of date, or inaccessible, please contact us.

0Bookmarked

0kudos

TEN: An open source tool for building real-time multimodal speech AI intelligences

Function List

Using Help

Installation process

Functional operation flow

real time voice interaction

Workflow construction (TMAN Designer)

Extended Integration

Hardware support (ESP32)

caveat

application scenario

QA

Related articles

Recommended

Can't find AI tools? Try here!

Popular AI tools

New Releases

Latest AI tools

TEN: An open source tool for building real-time multimodal speech AI intelligences

Function List

Using Help

Installation process

Functional operation flow

real time voice interaction

Workflow construction (TMAN Designer)

Extended Integration

Hardware support (ESP32)

caveat

application scenario

QA

Related articles

Recommended

Can't find AI tools? Try here!

Popular AI tools

New Releases

Latest AI tools

Quick query station AI tool