WorldPlay: Towards Long-Term Geometric Consistency for Real-Time Interactive World Modeling Paper • 2512.14614 • Published 17 days ago • 67
SpatialTree: How Spatial Abilities Branch Out in MLLMs Paper • 2512.20617 • Published 10 days ago • 42
StereoWorld: Geometry-Aware Monocular-to-Stereo Video Generation Paper • 2512.09363 • Published 24 days ago • 71
PreFM: Online Audio-Visual Event Parsing via Predictive Future Modeling Paper • 2505.23155 • Published May 29, 2025 • 2
PreFM: Online Audio-Visual Event Parsing via Predictive Future Modeling Paper • 2505.23155 • Published May 29, 2025 • 2
view article Article Cosmos Predict 2.5 & Transfer 2.5: Evolving the World Foundation Models for Physical AI Oct 28, 2025 • 20
UniVid: Unifying Vision Tasks with Pre-trained Video Generation Models Paper • 2509.21760 • Published Sep 26, 2025 • 14
SimpleVLA-RL: Scaling VLA Training via Reinforcement Learning Paper • 2509.09674 • Published Sep 11, 2025 • 80
EchoX: Towards Mitigating Acoustic-Semantic Gap via Echo Training for Speech-to-Speech LLMs Paper • 2509.09174 • Published Sep 11, 2025 • 61
HuMo: Human-Centric Video Generation via Collaborative Multi-Modal Conditioning Paper • 2509.08519 • Published Sep 10, 2025 • 128
Whole-Body Conditioned Egocentric Video Prediction Paper • 2506.21552 • Published Jun 26, 2025 • 11
Pangu Pro MoE: Mixture of Grouped Experts for Efficient Sparsity Paper • 2505.21411 • Published May 27, 2025 • 17
WorldVLA: Towards Autoregressive Action World Model Paper • 2506.21539 • Published Jun 26, 2025 • 40
ReasonMed: A 370K Multi-Agent Generated Dataset for Advancing Medical Reasoning Paper • 2506.09513 • Published Jun 11, 2025 • 101