Background
One of the common pain points in the music generation field is the single or uncontrollable style of the generated content. songGen effectively solves this problem through a fine-grained control mechanism.
Core Solutions
- Multi-dimensional property description: Input text that also containssound(e.g. pop/rock),state of mind(e.g. cheerful/melancholy),Instrument Description(e.g. piano + electric guitar) and other labels
- Structured Input Templates: Suggests a standardized format of "Style: [value], Mood: [value], Instrument: [value]".
- Reference Audio Assist: Upload 3-second audio clips of similar styles to enhance the model's understanding of the target style
operation suggestion
Example input text:
"Style: folk rock, mood: nostalgic and cozy, instrumentation: acoustic guitar lead + harmonica interlude"
Works better with reference audio.
This answer comes from the articleSongGen: A Single-Stage Autoregressive Transformer for Automatic Song GenerationThe































