daily paper
updated
rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep
Thinking
Paper
• 2501.04519
• Published
• 288
Transformer^2: Self-adaptive LLMs
Paper
• 2501.06252
• Published
• 55
Multimodal LLMs Can Reason about Aesthetics in Zero-Shot
Paper
• 2501.09012
• Published
• 10
FAST: Efficient Action Tokenization for Vision-Language-Action Models
Paper
• 2501.09747
• Published
• 29
Evolving Deeper LLM Thinking
Paper
• 2501.09891
• Published
• 115
Test-Time Preference Optimization: On-the-Fly Alignment via Iterative
Textual Feedback
Paper
• 2501.12895
• Published
• 61
Sigma: Differential Rescaling of Query, Key and Value for Efficient
Language Models
Paper
• 2501.13629
• Published
• 48
Can We Generate Images with CoT? Let's Verify and Reinforce Image
Generation Step by Step
Paper
• 2501.13926
• Published
• 43
ScoreFlow: Mastering LLM Agent Workflows via Score-based Preference
Optimization
Paper
• 2502.04306
• Published
• 20
ChartCitor: Multi-Agent Framework for Fine-Grained Chart Visual
Attribution
Paper
• 2502.00989
• Published
• 8
PILAF: Optimal Human Preference Sampling for Reward Modeling
Paper
• 2502.04270
• Published
• 12
Paper
• 2502.06786
• Published
• 32
Show-o Turbo: Towards Accelerated Unified Multimodal Understanding and
Generation
Paper
• 2502.05415
• Published
• 20
Region-Adaptive Sampling for Diffusion Transformers
Paper
• 2502.10389
• Published
• 53
MM-RLHF: The Next Step Forward in Multimodal LLM Alignment
Paper
• 2502.10391
• Published
• 34
ImageRAG: Dynamic Image Retrieval for Reference-Guided Image Generation
Paper
• 2502.09411
• Published
• 22
AdaptiveStep: Automatically Dividing Reasoning Step through Model
Confidence
Paper
• 2502.13943
• Published
• 8
MLGym: A New Framework and Benchmark for Advancing AI Research Agents
Paper
• 2502.14499
• Published
• 194
SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic
Understanding, Localization, and Dense Features
Paper
• 2502.14786
• Published
• 158
How Much Knowledge Can You Pack into a LoRA Adapter without Harming LLM?
Paper
• 2502.14502
• Published
• 91
SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines
Paper
• 2502.14739
• Published
• 108
Logic-RL: Unleashing LLM Reasoning with Rule-Based Reinforcement
Learning
Paper
• 2502.14768
• Published
• 47
Discovering highly efficient low-weight quantum error-correcting codes
with reinforcement learning
Paper
• 2502.14372
• Published
• 36
S^2R: Teaching LLMs to Self-verify and Self-correct via Reinforcement
Learning
Paper
• 2502.12853
• Published
• 29
Multimodal Inconsistency Reasoning (MMIR): A New Benchmark for
Multimodal Reasoning Models
Paper
• 2502.16033
• Published
• 18
SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open
Software Evolution
Paper
• 2502.18449
• Published
• 75
Agent Lightning: Train ANY AI Agents with Reinforcement Learning
Paper
• 2508.03680
• Published
• 137
PaddleOCR-VL: Boosting Multilingual Document Parsing via a 0.9B Ultra-Compact Vision-Language Model
Paper
• 2510.14528
• Published
• 118