Current Position:fig. beginning » AI audio

Ichigo (llama3-s)

2024-11-12

3.5 K 5

make a copy of

Ichigo is an open source, real-time speech AI project that aims to extend text-based language models with native "listening" capabilities. The project uses early fusion techniques inspired by Meta's Chameleon paper.Ichigo aims to be an open-source data, open-weighted, native-device voice assistant, similar to Siri.The project is open for partners to join in the crowdsourcing of speech datasets.

Ichigo（llama3-s）：本地实时语音AI助手，开源版Siri-1

Function List

Real-time speech recognition: The ability to process and understand user voice input in real time.
multicast dialogue capability: Supports multiple rounds of dialog and is able to maintain context in a conversation.
noise management: The ability to refuse to process non-speech audio inputs through training improves the user experience.
Open source and scalable: The project code and model weights are completely open source and users are free to download and extend them.
local deployment: Supports deployment on local devices to protect user privacy.

Using Help

Installation process

environmental preparation ：
- Ensure that Python 3.8 or above is installed.
- Install the necessary dependency libraries:pip install -r requirements.txt。

Download model ：

Use the following command to download the Ichigo model:

git clone https://github.com/homebrewltd/ichigo.git
cd ichigo
pip install -e .

Configuring the dataset ：
- Download the required dataset from HuggingFace and set the dataset path in the configuration file.
Launch Demo ：
- Start the local Gradio Demo with the following command:
```
python demo.py --use-4bit --use-8bit
```

Usage Process

Starting services ：
- After running the above command, visit the locally provided URL to access Ichigo's Web UI interface.
voice input ：
- In the Web UI interface, click the microphone icon to start recording, and the system will process and display the speech recognition results in real time.
many rounds of dialogue ：
- The system supports multiple rounds of dialog, where the user can continuously input speech and the system will maintain the context to understand and respond.
noise management ：
- The system is trained to recognize and reject the processing of non-speech audio inputs to ensure the accuracy of the recognition results.
Custom extensions ：
- Users can modify the code and model as needed to add new features or improve existing ones.

Detailed Operation Procedure

Download and Installation ：
- Visit Ichigo's GitHub page and follow the installation process to download and install the necessary dependencies and models.
Configuration and startup ：
- According to the configuration file provided by the project, set the dataset path and model parameters to start the local service.
Using the Web UI ：
- Experience Ichigo's real-time speech recognition and multi-round dialog features by performing voice input and interaction through the Web UI interface.
Extension and customization ：
- Understand the architecture and workings of the system based on project documentation and code comments for custom extensions.

AI open source project Multimodal real-time interactive products

AI productivity tools » Ichigo (llama3-s) Posted on 2024-11-12, please contact us if you find the URL is out of date, or inaccessible.

0Bookmarked

0kudos

Ichigo (llama3-s)

Function List

Using Help

Installation process

Usage Process

Detailed Operation Procedure

Recommended

Can't find AI tools? Try here!

Selection → Writing → Publishing, fully automated!

Popular AI tools

New Releases

Latest AI tools

Ichigo (llama3-s)

Function List

Using Help

Installation process

Usage Process

Detailed Operation Procedure

Recommended

Can't find AI tools? Try here!

Selection → Writing → Publishing, fully automated!

Popular AI tools

New Releases

Latest AI tools

Quick query station AI tool