view article Article Welcome GPT OSS, the new open-source model family from OpenAI! +10 reach-vb, pcuenq, lewtun, clem, Rocketknight1, clefourrier, celinah, Wauplin, marcsun13, pagezyhf, ahadnagy, joaogante • Aug 5, 2025 • 513
view article Article SmolLM3: smol, multilingual, long-context reasoner +21 eliebak, cmpatino, anton-l, edbeeching, m-ric, nouamanetazi, akseljoonas, guipenedo, hynky, clefourrier, SaylorTwift, kashif, qgallouedec, hlarcher, glutamatt, Xenova, reach-vb, ngxson, craffel, lewtun, loubnabnl, lvwerra, thomwolf • Jul 8, 2025 • 773
view article Article TimeScope: How Long Can Your Video Large Multimodal Model Go? +2 orrzohar, ruili0, andito, nicholswang • Jul 23, 2025 • 48
Machine Mental Imagery: Empower Multimodal Reasoning with Latent Visual Tokens Paper • 2506.17218 • Published Jun 20, 2025 • 29
Animate Anyone 2: High-Fidelity Character Image Animation with Environment Affordance Paper • 2502.06145 • Published Feb 10, 2025 • 18
view article Article nanoVLM: The simplest repository to train your VLM in pure PyTorch +5 ariG23498, lusxvr, andito, sergiopaniego, merve, pcuenq, reach-vb • May 21, 2025 • 257
view article Article Vision Language Models (Better, faster, stronger) +3 merve, sergiopaniego, ariG23498, pcuenq, andito • May 12, 2025 • 611
view article Article Cohere on Hugging Face Inference Providers 🔥 +5 reach-vb, burtenshaw, merve, celinah, alexrs, julien-c, sbrandeis • Apr 16, 2025 • 129
SmolVLM: Redefining small and efficient multimodal models Paper • 2504.05299 • Published Apr 7, 2025 • 207
view article Article SmolVLM2: Bringing Video Understanding to Every Device +5 orrzohar, mfarre, andito, merve, pcuenq, cyrilzakka, Xenova • Feb 20, 2025 • 337
view article Article SmolVLM Grows Smaller – Introducing the 256M & 500M Models! +1 andito, mfarre, merve • Jan 23, 2025 • 192
Apollo: An Exploration of Video Understanding in Large Multimodal Models Paper • 2412.10360 • Published Dec 13, 2024 • 147
LongVU: Spatiotemporal Adaptive Compression for Long Video-Language Understanding Paper • 2410.17434 • Published Oct 22, 2024 • 27
view article Article FineVideo: behind the scenes +4 mfarre, andito, lewtun, lvwerra, pcuenq, thomwolf • Sep 23, 2024 • 35
view article Article Docmatix - a huge dataset for Document Visual Question Answering andito, HugoLaurencon • Jul 18, 2024 • 78
view article Article Scaling robotics datasets with video encoding +1 aliberts, cadene, mfarre • Aug 27, 2024 • 41
Building and better understanding vision-language models: insights and future directions Paper • 2408.12637 • Published Aug 22, 2024 • 133