InsertAnywhere: Bridging 4D Scene Geometry and Diffusion Models for Realistic Video Object Insertion Paper • 2512.17504 • Published 17 days ago • 95
Infinite-Homography as Robust Conditioning for Camera-Controlled Video Generation Paper • 2512.17040 • Published 18 days ago • 27
Vector Prism: Animating Vector Graphics by Stratifying Semantic Structure Paper • 2512.14336 • Published 20 days ago • 28
EgoX: Egocentric Video Generation from a Single Exocentric Video Paper • 2512.08269 • Published 27 days ago • 116
Multi-view Pyramid Transformer: Look Coarser to See Broader Paper • 2512.07806 • Published 28 days ago • 20
ThinkMorph: Emergent Properties in Multimodal Interleaved Chain-of-Thought Reasoning Paper • 2510.27492 • Published Oct 30, 2025 • 82
PHUMA: Physically-Grounded Humanoid Locomotion Dataset Paper • 2510.26236 • Published Oct 30, 2025 • 29
Emu3.5: Native Multimodal Models are World Learners Paper • 2510.26583 • Published Oct 30, 2025 • 108
ACG: Action Coherence Guidance for Flow-based VLA models Paper • 2510.22201 • Published Oct 25, 2025 • 36
OmniVinci: Enhancing Architecture and Data for Omni-Modal Understanding LLM Paper • 2510.15870 • Published Oct 17, 2025 • 89
A Theoretical Study on Bridging Internal Probability and Self-Consistency for LLM Reasoning Paper • 2510.15444 • Published Oct 17, 2025 • 147
ParallelBench: Understanding the Trade-offs of Parallel Decoding in Diffusion LLMs Paper • 2510.04767 • Published Oct 6, 2025 • 27
Hybrid Architectures for Language Models: Systematic Analysis and Design Insights Paper • 2510.04800 • Published Oct 6, 2025 • 36
Lumina-DiMOO: An Omni Diffusion Large Language Model for Multi-Modal Generation and Understanding Paper • 2510.06308 • Published Oct 7, 2025 • 54