Understanding-in-Generation: Reinforcing Generative Capability of Unified Model via Infusing Understanding into Generation Paper • 2509.18639 • Published Sep 23, 2025
Multimodal Spatial Reasoning in the Large Model Era: A Survey and Benchmarks Paper • 2510.25760 • Published Oct 29, 2025 • 17
Show, Don't Tell: Morphing Latent Reasoning into Image Generation Paper • 2602.02227 • Published Feb 2 • 10
DVD: Deterministic Video Depth Estimation with Generative Priors Paper • 2603.12250 • Published 17 days ago • 26
DVD: Deterministic Video Depth Estimation with Generative Priors Paper • 2603.12250 • Published 17 days ago • 26
OPUS: Towards Efficient and Principled Data Selection in Large Language Model Pre-training in Every Iteration Paper • 2602.05400 • Published Feb 5 • 349
Show, Don't Tell: Morphing Latent Reasoning into Image Generation Paper • 2602.02227 • Published Feb 2 • 10
Innovator-VL: A Multimodal Large Language Model for Scientific Discovery Paper • 2601.19325 • Published Jan 27 • 81
A4-Agent: An Agentic Framework for Zero-Shot Affordance Reasoning Paper • 2512.14442 • Published Dec 16, 2025 • 11
A4-Agent: An Agentic Framework for Zero-Shot Affordance Reasoning Paper • 2512.14442 • Published Dec 16, 2025 • 11
DualCamCtrl: Dual-Branch Diffusion Model for Geometry-Aware Camera-Controlled Video Generation Paper • 2511.23127 • Published Nov 28, 2025 • 44
Accelerating Streaming Video Large Language Models via Hierarchical Token Compression Paper • 2512.00891 • Published Nov 30, 2025 • 16
Accelerating Streaming Video Large Language Models via Hierarchical Token Compression Paper • 2512.00891 • Published Nov 30, 2025 • 16