csm-mlx is based on the MLX framework developed by Apple, optimized for the CSM (Conversation Speech Model) voice conversation model specifically for the Apple Silicon. This project allows users to run efficient speech generation and dialog functions on Apple devices in a simple way. Developer senstella released this project on March 15, 2025 with the goal of getting more people to take advantage of the power of Apple devices and explore speech technology. The core of the project is to provide a lightweight, easy-to-use tool that supports generating natural speech and processing dialog scenarios.

Function List
- Speech Generation: Generate natural human voice audio after inputting text.
- Conversation context support: Generate coherent voice replies based on the content of previous conversations.
- Apple device optimization: efficiently running models on Apple silicon using the MLX framework.
- Open source model loading: Support for downloading pre-trained models from Hugging Face (e.g. csm-1b).
- Adjustable Parameters: provides sampler parameter adjustments such as temperature (temp) and minimum probability (min_p) to control the generation effect.
Using Help
Installation process
To use csm-mlx locally, you need to install some dependent tools and environments first. Below are the detailed steps:
- Preparing the environment
- Make sure you're using macOS and that the device is powered by Apple silicon (e.g. M1, M2 chips).
- Install Python 3.10 or later. You can install Python with the command brew install python@3.10Installation via Homebrew.
- Install Git, run brew install git(can be skipped if already installed).
 
- cloning project
- Open a terminal and enter the following command to download the csm-mlx project:
git clone https://github.com/senstella/csm-mlx.git
- Go to the project folder:
cd csm-mlx
 
- Open a terminal and enter the following command to download the csm-mlx project:
- Creating a Virtual Environment
- Create a Python virtual environment in the project directory:
python3.10 -m venv .venv
- Activate the virtual environment:
source .venv/bin/activate
 
- Create a Python virtual environment in the project directory:
- Installation of dependencies
- Install the Python packages needed for the project and run it:
pip install -r requirements.txt
- Note: You need to make sure that the MLX framework and Hugging Face are installed. huggingface_hublibrary. If you encounter problems, you can run a separatepip install mlx huggingface_hubThe
 
- Install the Python packages needed for the project and run it:
- Download model
- csm-mlx using pre-trained models csm-1b-mlx. Run the following code to download automatically:python -c "from huggingface_hub import hf_hub_download; hf_hub_download(repo_id='senstella/csm-1b-mlx', filename='ckpt.safetensors')"
- The model files are saved in the local cache directory (usually the ~/.cache/huggingface/hub).
 
- csm-mlx using pre-trained models 
How to use
Once installed, you can run csm-mlx's speech generation feature with a Python script. Here are the steps to do so:
Basic Speech Generation
- Writing scripts
- Create a file in the project directory, such as generate_audio.py, enter the following code:from csm_mlx import CSM, csm_1b, generate from mlx_lm.sample_utils import make_sampler from huggingface_hub import hf_hub_download # 初始化模型 csm = CSM(csm_1b()) weight = hf_hub_download(repo_id="senstella/csm-1b-mlx", filename="ckpt.safetensors") csm.load_weights(weight) # 生成音频 audio = generate( csm, text="你好,我是 csm-mlx。", speaker=0, context=[], max_audio_length_ms=10000, # 最大音频长度 10 秒 sampler=make_sampler(temp=0.5, min_p=0.1) ) # 保存音频 import audiofile audiofile.write("output.wav", audio, 22050) # 22050 是采样率
- Note: Saving audio requires the installation of audiofilelibrary, run thepip install audiofileThe
 
- Create a file in the project directory, such as 
- Running Scripts
- Enter it in the terminal:
python generate_audio.py
- Running it generates the following in the current directory output.wavfile, double-click it to play it.
 
- Enter it in the terminal:
Adding Context to a Conversation
- Modifying the Script Support Context
- If you want the model to generate responses based on previous conversations, you can add the contextParameters. The modification code is as follows:from csm_mlx import CSM, csm_1b, generate, Segment import mlx.core as mx from huggingface_hub import hf_hub_download # 初始化模型 csm = CSM(csm_1b()) weight = hf_hub_download(repo_id="senstella/csm-1b-mlx", filename="ckpt.safetensors") csm.load_weights(weight) # 创建对话上下文 context = [ Segment(speaker=0, text="你好,今天天气怎么样?", audio=mx.array([...])), Segment(speaker=1, text="很好,阳光明媚。", audio=mx.array([...])) ] # 生成回复 audio = generate( csm, text="那我们出去走走吧!", speaker=0, context=context, max_audio_length_ms=5000 ) # 保存音频 import audiofile audiofile.write("reply.wav", audio, 22050)
- Attention:audio=mx.array([...])Requires previous audio data. If not, you can generate the audio first with basic generation and then fill it with its result.
 
- If you want the model to generate responses based on previous conversations, you can add the 
- Run and test
- fulfillment python generate_audio.pyGenerating Contextualized Speech Filesreply.wavThe
 
- fulfillment 
parameterization
- Temperature (temp): Controls the randomness of speech. The smaller the value (e.g. 0.5), the more stable the speech; the larger the value (e.g. 1.0), the more varied the speech.
- Maximum length (max_audio_length_ms): The unit is milliseconds, e.g. 5000 for 5 seconds.
- Adjustment method: in make_samplermaybegeneratefunction to change the parameters and then re-run the script.
caveat
- If you are experiencing memory problems when generating audio, try reducing the size of the max_audio_length_msThe
- Ensure that you have a good internet connection, as the first run of the model requires the weights file to be downloaded, which is around a few GB in size.
application scenario
- Educational aids
 Users can use csm-mlx to generate speech explanations for teaching content. For example, input the text and generate natural speech for listening practice.
- Virtual Assistant Development
 Developers can utilize csm-mlx to build intelligent voice assistants. Combined with the dialog context feature, the assistant can generate coherent responses based on the user's words.
- content creation
 Podcast producers can use it to convert scripts to speech, quickly generate audio clips and save recording time.
QA
- Does csm-mlx support Chinese?
 Yes, it supports Chinese input and generates Chinese speech. However, the effect depends on the training data, and it is recommended to test specific utterances to confirm the quality.
- How much hard disk space is required?
 The model files are about 1-2 GB, plus the dependency libraries and generated files, it is recommended to reserve 5 GB of space.
- Will it work on Windows?
 No, csm-mlx is designed for Apple silicon, relies on the MLX framework, and currently only supports macOS.
































 English
English				 简体中文
简体中文					           日本語
日本語					           Deutsch
Deutsch					           Português do Brasil
Português do Brasil