Understanding-in-Generation: Reinforcing Generative Capability of Unified Model via Infusing Understanding into Generation Paper • 2509.18639 • Published Sep 23, 2025
Multimodal Spatial Reasoning in the Large Model Era: A Survey and Benchmarks Paper • 2510.25760 • Published Oct 29, 2025 • 17
Show, Don't Tell: Morphing Latent Reasoning into Image Generation Paper • 2602.02227 • Published Feb 2 • 10
DVD: Deterministic Video Depth Estimation with Generative Priors Paper • 2603.12250 • Published 14 days ago • 26
A4-Agent: An Agentic Framework for Zero-Shot Affordance Reasoning Paper • 2512.14442 • Published Dec 16, 2025 • 11
Accelerating Streaming Video Large Language Models via Hierarchical Token Compression Paper • 2512.00891 • Published Nov 30, 2025 • 16
PANORAMA: The Rise of Omnidirectional Vision in the Embodied AI Era Paper • 2509.12989 • Published Sep 16, 2025 • 28
OmniSAM: Omnidirectional Segment Anything Model for UDA in Panoramic Semantic Segmentation Paper • 2503.07098 • Published Mar 10, 2025
Are We Using the Right Benchmark: An Evaluation Framework for Visual Token Compression Methods Paper • 2510.07143 • Published Oct 8, 2025 • 13
AI for Service: Proactive Assistance with AI Glasses Paper • 2510.14359 • Published Oct 16, 2025 • 78
Benchmarking Multi-modal Semantic Segmentation under Sensor Failures: Missing and Noisy Modality Robustness Paper • 2503.18445 • Published Mar 24, 2025 • 1
Shifting AI Efficiency From Model-Centric to Data-Centric Compression Paper • 2505.19147 • Published May 25, 2025 • 146