Fr0zencr4nE
/

Cockatiel-8B

Video-Text-to-Text

Model card Files Files and versions

license: cc-by-4.0

A competitive and human-aligned detailed video captioner model based on VILA-v1.5-8B.

This model produces detailed captions for input video, as presented in Cockatiel: Ensembling Synthetic and Human Preferenced Training for Detailed Video Caption.

For more details, please refer to our project page: https://sais-fuxi.github.io/projects/cockatiel

Code: https://github.com/Fr0zenCrane/Cockatiel

Downloads last month: 2

Inference Providers NEW

Video-Text-to-Text

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for Fr0zencr4nE/Cockatiel-8B

Cockatiel: Ensembling Synthetic and Human Preferenced Training for Detailed Video Caption

Paper • 2503.09279 • Published Mar 12, 2025 • 5