WorldReasonBench: Human-Aligned Stress Testing of Video Generators as Future World-State Predictors Paper • 2605.10434 • Published 12 days ago • 30
Beyond Semantic Similarity: Rethinking Retrieval for Agentic Search via Direct Corpus Interaction Paper • 2605.05242 • Published 20 days ago • 113
Beyond SFT-to-RL: Pre-alignment via Black-Box On-Policy Distillation for Multimodal RL Paper • 2604.28123 • Published 22 days ago • 48
Tuna-2: Pixel Embeddings Beat Vision Encoders for Multimodal Understanding and Generation Paper • 2604.24763 • Published 26 days ago • 71
Visual Generation in the New Era: An Evolution from Atomic Mapping to Agentic World Modeling Paper • 2604.28185 • Published 23 days ago • 90
RationalRewards: Reasoning Rewards Scale Visual Generation Both Training and Test Time Paper • 2604.11626 • Published Apr 13 • 102
VecGlypher: Unified Vector Glyph Generation with Language Models Paper • 2602.21461 • Published Feb 25 • 12
ImagenWorld: Stress-Testing Image Generation Models with Explainable Human Evaluation on Open-ended Real-World Tasks Paper • 2603.27862 • Published Mar 29 • 32
ImagenWorld: Stress-Testing Image Generation Models with Explainable Human Evaluation on Open-ended Real-World Tasks Paper • 2603.27862 • Published Mar 29 • 32
ImagenWorld: Stress-Testing Image Generation Models with Explainable Human Evaluation on Open-ended Real-World Tasks Paper • 2603.27862 • Published Mar 29 • 32
Visual-Aware CoT: Achieving High-Fidelity Visual Consistency in Unified Models Paper • 2512.19686 • Published Dec 22, 2025 • 1
VisPhyWorld: Probing Physical Reasoning via Code-Driven Video Reconstruction Paper • 2602.13294 • Published Feb 9 • 13
Context Forcing: Consistent Autoregressive Video Generation with Long Context Paper • 2602.06028 • Published Feb 5 • 36
OneStory: Coherent Multi-Shot Video Generation with Adaptive Memory Paper • 2512.07802 • Published Dec 8, 2025 • 46
HiStream: Efficient High-Resolution Video Generation via Redundancy-Eliminated Streaming Paper • 2512.21338 • Published Dec 24, 2025 • 23
DeepResearchEval: An Automated Framework for Deep Research Task Construction and Agentic Evaluation Paper • 2601.09688 • Published Jan 14 • 127
DeepResearchEval: An Automated Framework for Deep Research Task Construction and Agentic Evaluation Paper • 2601.09688 • Published Jan 14 • 127
TUNA: Taming Unified Visual Representations for Native Unified Multimodal Models Paper • 2512.02014 • Published Dec 1, 2025 • 77
LongVT: Incentivizing "Thinking with Long Videos" via Native Tool Calling Paper • 2511.20785 • Published Nov 25, 2025 • 188