Visual Funnel: Resolving Contextual Blindness in Multimodal Large Language Models Paper โข 2512.10362 โข Published 27 days ago โข 1
D2E: Scaling Vision-Action Pretraining on Desktop Data for Transfer to Embodied AI Paper โข 2510.05684 โข Published Oct 7, 2025 โข 141
Exploring Fine-Tuning of Large Audio Language Models for Spoken Language Understanding under Limited Speech data Paper โข 2509.15389 โข Published Sep 18, 2025 โข 3
Def-DTS: Deductive Reasoning for Open-domain Dialogue Topic Segmentation Paper โข 2505.21033 โข Published May 27, 2025 โข 3
KOFFVQA: An Objectively Evaluated Free-form VQA Benchmark for Large Vision-Language Models in the Korean Language Paper โข 2503.23730 โข Published Mar 31, 2025 โข 3
HerO at AVeriTeC: The Herd of Open Large Language Models for Verifying Real-World Claims Paper โข 2410.12377 โข Published Oct 16, 2024
CANVAS: Commonsense-Aware Navigation System for Intuitive Human-Robot Interaction Paper โข 2410.01273 โข Published Oct 2, 2024 โข 12
EnCLAP++: Analyzing the EnCLAP Framework for Optimizing Automated Audio Captioning Performance Paper โข 2409.01201 โข Published Sep 2, 2024 โข 1
EnCLAP: Combining Neural Audio Codec and Audio-Text Joint Embedding for Automated Audio Captioning Paper โข 2401.17690 โข Published Jan 31, 2024 โข 5