67 21 31

Miquel Farré

mfarre

AI & ML interests

I like everything video

Recent Activity

updated a dataset 12 days ago

HuggingFaceFV/finevideo

liked a model 27 days ago

tencent/HY-World-2.0

updated a dataset 2 months ago

vlmbook/notebooks

View all activity

Organizations

upvoted an article 9 months ago

Article

Welcome GPT OSS, the new open-source model family from OpenAI!

reach-vb, pcuenq, lewtun, clem, Rocketknight1, clefourrier, celinah, Wauplin, marcsun13, pagezyhf, ahadnagy, joaogante

•

Aug 5, 2025

• 513

upvoted 2 articles 10 months ago

Article

SmolLM3: smol, multilingual, long-context reasoner

eliebak, cmpatino, anton-l, edbeeching, m-ric, nouamanetazi, akseljoonas, guipenedo, hynky, clefourrier, SaylorTwift, kashif, qgallouedec, hlarcher, glutamatt, Xenova, reach-vb, ngxson, craffel, lewtun, loubnabnl, lvwerra, thomwolf

•

Jul 8, 2025

• 773

Article

TimeScope: How Long Can Your Video Large Multimodal Model Go?

orrzohar, ruili0, andito, nicholswang

•

Jul 23, 2025

• 48

upvoted a paper 10 months ago

Machine Mental Imagery: Empower Multimodal Reasoning with Latent Visual Tokens

Paper • 2506.17218 • Published Jun 20, 2025 • 29

upvoted a paper 11 months ago

Animate Anyone 2: High-Fidelity Character Image Animation with Environment Affordance

Paper • 2502.06145 • Published Feb 10, 2025 • 18

upvoted 2 articles 12 months ago

Article

nanoVLM: The simplest repository to train your VLM in pure PyTorch

ariG23498, lusxvr, andito, sergiopaniego, merve, pcuenq, reach-vb

•

May 21, 2025

• 257

Article

Vision Language Models (Better, faster, stronger)

merve, sergiopaniego, ariG23498, pcuenq, andito

•

May 12, 2025

• 611

upvoted an article about 1 year ago

Article

Cohere on Hugging Face Inference Providers 🔥

reach-vb, burtenshaw, merve, celinah, alexrs, julien-c, sbrandeis

•

Apr 16, 2025

• 129

upvoted a paper about 1 year ago

SmolVLM: Redefining small and efficient multimodal models

Paper • 2504.05299 • Published Apr 7, 2025 • 207

upvoted an article about 1 year ago

Article

SmolVLM2: Bringing Video Understanding to Every Device

orrzohar, mfarre, andito, merve, pcuenq, cyrilzakka, Xenova

•

Feb 20, 2025

• 337

upvoted a collection about 1 year ago

SmolVLM2 📺 Smallest video LM ever 🤏🏻

Collection

11 items • Updated May 5, 2025 • 112

upvoted 2 articles over 1 year ago

Article

SmolVLM Grows Smaller – Introducing the 256M & 500M Models!

andito, mfarre, merve

•

Jan 23, 2025

• 192

Article

Announcing NVIDIA Cosmos World Foundation Models

mingyuliutw

•

Jan 7, 2025

• 28

upvoted 2 papers over 1 year ago

Apollo: An Exploration of Video Understanding in Large Multimodal Models

Paper • 2412.10360 • Published Dec 13, 2024 • 147

LongVU: Spatiotemporal Adaptive Compression for Long Video-Language Understanding

Paper • 2410.17434 • Published Oct 22, 2024 • 27

upvoted 3 articles over 1 year ago

Article

FineVideo: behind the scenes

mfarre, andito, lewtun, lvwerra, pcuenq, thomwolf

•

Sep 23, 2024

• 35

Article

Docmatix - a huge dataset for Document Visual Question Answering

andito, HugoLaurencon

•

Jul 18, 2024

• 78

Article

Scaling robotics datasets with video encoding

aliberts, cadene, mfarre

•

Aug 27, 2024

• 41

upvoted a paper over 1 year ago

Building and better understanding vision-language models: insights and future directions

Paper • 2408.12637 • Published Aug 22, 2024 • 133

Miquel Farré

AI & ML interests

Recent Activity

Organizations

mfarre's activity

Welcome GPT OSS, the new open-source model family from OpenAI!

SmolLM3: smol, multilingual, long-context reasoner

TimeScope: How Long Can Your Video Large Multimodal Model Go?

nanoVLM: The simplest repository to train your VLM in pure PyTorch

Vision Language Models (Better, faster, stronger)

Cohere on Hugging Face Inference Providers 🔥

SmolVLM2: Bringing Video Understanding to Every Device

SmolVLM Grows Smaller – Introducing the 256M & 500M Models!

Announcing NVIDIA Cosmos World Foundation Models

FineVideo: behind the scenes

Docmatix - a huge dataset for Document Visual Question Answering

Scaling robotics datasets with video encoding