Learning to Reason in 4D: Dynamic Spatial Understanding for Vision Language Models Paper • 2512.20557 • Published 4 days ago • 46
view article Article SmolVLA: Efficient Vision-Language-Action Model trained on Lerobot Community Data +7 Jun 3 • 299
view article Article Metric and Relative Monocular Depth Estimation: An Overview. Fine-Tuning Depth Anything V2 👐 📚 Jul 10, 2024 • 90
Easy Dataset: A Unified and Extensible Framework for Synthesizing LLM Fine-Tuning Data from Unstructured Documents Paper • 2507.04009 • Published Jul 5 • 51
StableAvatar: Infinite-Length Audio-Driven Avatar Video Generation Paper • 2508.08248 • Published Aug 11 • 27