Cockatiel: Ensembling Synthetic and Human Preferenced Training for Detailed Video Caption
Paper
• 2503.09279 • Published
• 5
A competitive and human-aligned detailed video captioner model based on VILA-v1.5-8B.
This model produces detailed captions for input video, as presented in Cockatiel: Ensembling Synthetic and Human Preferenced Training for Detailed Video Caption.
For more details, please refer to our project page: https://sais-fuxi.github.io/projects/cockatiel