daily paper - a wyf666 Collection

wyf666 's Collections

daily paper

updated Oct 30, 2025

rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking

Paper • 2501.04519 • Published Jan 8, 2025 • 288
Transformer^2: Self-adaptive LLMs

Paper • 2501.06252 • Published Jan 9, 2025 • 55
Multimodal LLMs Can Reason about Aesthetics in Zero-Shot

Paper • 2501.09012 • Published Jan 15, 2025 • 10
FAST: Efficient Action Tokenization for Vision-Language-Action Models

Paper • 2501.09747 • Published Jan 16, 2025 • 29
Evolving Deeper LLM Thinking

Paper • 2501.09891 • Published Jan 17, 2025 • 115
Test-Time Preference Optimization: On-the-Fly Alignment via Iterative Textual Feedback

Paper • 2501.12895 • Published Jan 22, 2025 • 61
Sigma: Differential Rescaling of Query, Key and Value for Efficient Language Models

Paper • 2501.13629 • Published Jan 23, 2025 • 48
Can We Generate Images with CoT? Let's Verify and Reinforce Image Generation Step by Step

Paper • 2501.13926 • Published Jan 23, 2025 • 43
ScoreFlow: Mastering LLM Agent Workflows via Score-based Preference Optimization

Paper • 2502.04306 • Published Feb 6, 2025 • 20
ChartCitor: Multi-Agent Framework for Fine-Grained Chart Visual Attribution

Paper • 2502.00989 • Published Feb 3, 2025 • 8
PILAF: Optimal Human Preference Sampling for Reward Modeling

Paper • 2502.04270 • Published Feb 6, 2025 • 12
Matryoshka Quantization

Paper • 2502.06786 • Published Feb 10, 2025 • 32
Show-o Turbo: Towards Accelerated Unified Multimodal Understanding and Generation

Paper • 2502.05415 • Published Feb 8, 2025 • 20
Region-Adaptive Sampling for Diffusion Transformers

Paper • 2502.10389 • Published Feb 14, 2025 • 53
MM-RLHF: The Next Step Forward in Multimodal LLM Alignment

Paper • 2502.10391 • Published Feb 14, 2025 • 34
ImageRAG: Dynamic Image Retrieval for Reference-Guided Image Generation

Paper • 2502.09411 • Published Feb 13, 2025 • 22
AdaptiveStep: Automatically Dividing Reasoning Step through Model Confidence

Paper • 2502.13943 • Published Feb 19, 2025 • 8
MLGym: A New Framework and Benchmark for Advancing AI Research Agents

Paper • 2502.14499 • Published Feb 20, 2025 • 194
SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features

Paper • 2502.14786 • Published Feb 20, 2025 • 158
How Much Knowledge Can You Pack into a LoRA Adapter without Harming LLM?

Paper • 2502.14502 • Published Feb 20, 2025 • 91
SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines

Paper • 2502.14739 • Published Feb 20, 2025 • 108
Logic-RL: Unleashing LLM Reasoning with Rule-Based Reinforcement Learning

Paper • 2502.14768 • Published Feb 20, 2025 • 47
Discovering highly efficient low-weight quantum error-correcting codes with reinforcement learning

Paper • 2502.14372 • Published Feb 20, 2025 • 36
S^2R: Teaching LLMs to Self-verify and Self-correct via Reinforcement Learning

Paper • 2502.12853 • Published Feb 18, 2025 • 29
Multimodal Inconsistency Reasoning (MMIR): A New Benchmark for Multimodal Reasoning Models

Paper • 2502.16033 • Published Feb 22, 2025 • 18
SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution

Paper • 2502.18449 • Published Feb 25, 2025 • 75
Agent Lightning: Train ANY AI Agents with Reinforcement Learning

Paper • 2508.03680 • Published Aug 5, 2025 • 137
PaddleOCR-VL: Boosting Multilingual Document Parsing via a 0.9B Ultra-Compact Vision-Language Model

Paper • 2510.14528 • Published Oct 16, 2025 • 118