
AI-Chatbox: Speech-to-Text Intelligent Dialogue Project based on ESP32S3
AI-Chatbox is a voice interaction project based on the ESP32S3 development board. Users talk to the big model (LLM) by voice, the device will convert the voice to text and send it to the big model, after getting the answer, it can be further converted to voice broadcasting. The project is developed using Rust language, integrated with Vosk speech recognition tool, suitable for...

TEN: An open source tool for building real-time multimodal speech AI intelligences
TEN Framework is an open source software platform focused on helping developers build real-time, multimodal, low-latency speech AI intelligences. It supports multiple programming languages including C, C++, Go, Python, JavaScript, and TypeScript.Developers can use the TEN Framework to quickly create speech, visual, and text...

Zaia Health: the AI voice assistant that monitors and improves health habits
Zaia Health is an Artificial Intelligence health app that centers around a voice assistant called “Zaia”. The app is designed to help users focus on and improve their health habits. Through voice interaction, it acts as a personal health companion, guiding users towards a more regular routine in the areas of sleep, exercise, nutrition and mental health...

wukong-robot: a smart speaker project to create personalized Chinese voice conversations
wukong-robot is an open source Chinese voice conversation robot and smart speaker project, designed to help developers quickly build personalized smart speakers. It supports Chinese speech recognition, speech synthesis and multi-round dialog features , integrated with ChatGPT, Baidu, KDDI and other technologies. The project design is modular, plug-ins and features can be freely extended, suitable...

RealtimeVoiceChat
RealtimeVoiceChat is an open source project that focuses on real-time, natural conversations with artificial intelligence via voice. Users use the microphone to input speech, the system captures the audio through the browser, quickly converts it to text, generates a reply from a large language model (LLM), and then converts the text to speech output, the whole process is close to real-time. The project adopts ...

gibberlink: a demonstration project for efficient audio communication between two AI intelligences
gibberlink is an open source project on GitHub by developer PennyroyalTea that focuses on enabling communication optimization between two conversational AI intelligences. When two AI intelligences talk on the phone and recognize each other as AI, they switch from human language (English) to a...

OpenAI Realtime Agents
OpenAI Realtime Agents is an open source project that aims to show how OpenAI's real-time APIs can be utilized to build multi-intelligent body speech applications. It provides a high-level intelligent body model (borrowed from OpenAI Swarm) that allows developers to build complex multi-intelligent body speech systems in a short period of time. The project ...

Bailing
Bailing (Bailing) is an open-source voice conversation assistant designed to engage in natural conversations with users through speech. The project combines speech recognition (ASR), voice activity detection (VAD), large language modeling (LLM), and speech synthesis (TTS) technologies to implement a GPT-4o-like voice conversation bot. The end-to-end latency of BaiLing's ...

"Always-On" Deepseek AI Assistant: Building an Intelligent Voice Interaction System Based on Deepseek-V3
Always-On AI Assistant is an innovative AI assistant project that creates a powerful and permanently online AI assistant system by integrating advanced technologies such as Deepseek-V3, RealtimeSTT and Typer. The project is especially optimized for engineering development scenarios, providing a complete...

Xiaozhi AI Chatbot
Xiaozhi AI Chatbot is an open source project based on the ESP32 development board, designed to help users build their own AI chat companion. The project is developed by Shrimp and is mainly used for teaching purposes to help more people get started with AI hardware development and understand how to apply the big language model to actual hardware devices. The project supports speech recognition and dialog functions in multiple languages...

Fish Agent
Fish Speech Derivative Project Fish Agent is a revolutionary end-to-end AI speech cloning system developed based on V0.1 3B model architecture. As a fully end-to-end speech cloning processing system, its most important feature is that it adopts an innovative semantic tagless architecture design, which does not need to rely on traditional semantic compilers such as Whisper...

Ichigo (llama3-s)
Ichigo is an open source real-time speech AI project that aims to extend text-based language models with native “listening” capabilities. The project uses early fusion techniques inspired by Meta's Chameleon paper.Ichigo's goal is to become an open-source data, open-source weighted voice assistant for native devices, similar to S...

Hume AI: Empowering AI with Emotion Recognition | Recognizing Emotional States from Sounds and Expressions | Generating Speech with Emotional States
Hume AI is an Artificial Intelligence company focused on Emotional Intelligence, developing multimodal AI technologies that understand and respond to human emotions. Its flagship product, the Empathic Voice Interface (EVI), is able to recognize and respond to user emotions in a variety of forms, including speech, facial expressions, and language, to enhance the emotional experience of human-computer interaction.Hume AI's goal...
Top