The deployment requirements for M3-Agent can be categorized into two tiers: basic configuration and functional extensions:
- Core hardware requirements:
- Full run (with memory generation): 1 x A100 (80GB video memory) or 4 x RTX 3090 required
- Pure Reasoning Mode: GPU with at least 16GB of video memory
- Storage space: 200GB or more available space
- Environmental Dependencies:
- Base environment: you need to execute the setup.sh script to configure the base dependencies
- Memorization process: Installation of a specific version of the transformers library and the Qwen-Omni toolkit is required.
- Control process: requires exact version of transformers==4.51.0/vllm==0.8.4, etc.
It is worth noting that the video processing stage generates multiple intermediate files:
1) FFmpeg cut 30 seconds video clip
2) Vocal features generated by speakerlab
3) Final Memory Mapping .pkl file
It is recommended to use SSD storage to improve I/O efficiency, and extra cache space should be reserved for long video processing.
This answer comes from the articleM3-Agent: a multimodal intelligence with long-term memory and capable of processing audio and videoThe































