Tango 2: Aligning Diffusion-based Text-to-Audio Generations through Direct Preference Optimization
Paper
• 2404.09956 • Published
• 12
We use an ensemble filtering strategy based on two different CLAP models: 630k-audioset-best and 630k-best