SongGen's dual-track mode generates vocals (vocal) and accompaniment (acc) separately to meet the demands of professional-grade music production. The technical realization of the model passes:
- parallel decoding: Synchronized generation of two independent audio sequence streams
- time alignment: automatically adjusts the length of the two tracks to ensure synchronized playback
- Level Balancing: Maintain a reasonable volume ratio between tracks
This split output provides the mixer with a complete post-production space that can:
- Adjusting the EQ or effects of a track individually
- Replacement of specific instrument parts
- Redesigning Space Reverb
In contrast, the hybrid model is better suited for rapid content production scenarios, while the dual-track model is geared towards a professional authoring process.
This answer comes from the articleSongGen: A Single-Stage Autoregressive Transformer for Automatic Song GenerationThe































