Language - a neutrino12 Collection

neutrino12 's Collections

Datasets & Evals

Personal Interests

Language

updated Oct 1, 2025

Snowflake/Arctic-Text2SQL-R1-7B

8B • Updated May 29, 2025 • 3.52k • 61
Reflect, Retry, Reward: Self-Improving LLMs via Reinforcement Learning

Paper • 2505.24726 • Published May 30, 2025 • 277
Reinforcement Pre-Training

Paper • 2506.08007 • Published Jun 9, 2025 • 263
Drag-and-Drop LLMs: Zero-Shot Prompt-to-Weights

Paper • 2506.16406 • Published Jun 19, 2025 • 131
Confidence Is All You Need: Few-Shot RL Fine-Tuning of Language Models

Paper • 2506.06395 • Published Jun 5, 2025 • 133
Qwen3 Technical Report

Paper • 2505.09388 • Published May 14, 2025 • 334
QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning

Paper • 2505.17667 • Published May 23, 2025 • 88
Beyond Context Limits: Subconscious Threads for Long-Horizon Reasoning

Paper • 2507.16784 • Published Jul 22, 2025 • 122
Group Sequence Policy Optimization

Paper • 2507.18071 • Published Jul 24, 2025 • 317
nablaNABLA: Neighborhood Adaptive Block-Level Attention

Paper • 2507.13546 • Published Jul 17, 2025 • 125
GR-3 Technical Report

Paper • 2507.15493 • Published Jul 21, 2025 • 47
MUR: Momentum Uncertainty guided Reasoning for Large Language Models

Paper • 2507.14958 • Published Jul 20, 2025 • 47
LAPO: Internalizing Reasoning Efficiency via Length-Adaptive Policy Optimization

Paper • 2507.15758 • Published Jul 21, 2025 • 35
Complex Logical Instruction Generation

Paper • 2508.09125 • Published Aug 12, 2025 • 40
SONAR-LLM: Autoregressive Transformer that Thinks in Sentence Embeddings and Speaks in Tokens

Paper • 2508.05305 • Published Aug 7, 2025 • 47
PRELUDE: A Benchmark Designed to Require Global Comprehension and Reasoning over Long Contexts

Paper • 2508.09848 • Published Aug 13, 2025 • 71
Train Long, Think Short: Curriculum Learning for Efficient Reasoning

Paper • 2508.08940 • Published Aug 12, 2025 • 27
Learning to Align, Aligning to Learn: A Unified Approach for Self-Optimized Alignment

Paper • 2508.07750 • Published Aug 11, 2025 • 21
Pruning the Unsurprising: Efficient Code Reasoning via First-Token Surprisal

Paper • 2508.05988 • Published Aug 8, 2025 • 21
Less Is More: Training-Free Sparse Attention with Global Locality for Efficient Reasoning

Paper • 2508.07101 • Published Aug 9, 2025 • 14
Compressing Chain-of-Thought in LLMs via Step Entropy

Paper • 2508.03346 • Published Aug 5, 2025 • 8
Sample More to Think Less: Group Filtered Policy Optimization for Concise Reasoning

Paper • 2508.09726 • Published Aug 13, 2025 • 15
Is Chain-of-Thought Reasoning of LLMs a Mirage? A Data Distribution Lens

Paper • 2508.01191 • Published Aug 2, 2025 • 238
On the Generalization of SFT: A Reinforcement Learning Perspective with Reward Rectification

Paper • 2508.05629 • Published Aug 7, 2025 • 183
R-Zero: Self-Evolving Reasoning LLM from Zero Data

Paper • 2508.05004 • Published Aug 7, 2025 • 130
Story2Board: A Training-Free Approach for Expressive Storyboard Generation

Paper • 2508.09983 • Published Aug 13, 2025 • 70
Don't Overthink It: A Survey of Efficient R1-style Large Reasoning Models

Paper • 2508.02120 • Published Aug 4, 2025 • 20
Beyond the Trade-off: Self-Supervised Reinforcement Learning for Reasoning Models' Instruction Following

Paper • 2508.02150 • Published Aug 4, 2025 • 37
Trainable Dynamic Mask Sparse Attention

Paper • 2508.02124 • Published Aug 4, 2025 • 19
Can Large Multimodal Models Actively Recognize Faulty Inputs? A Systematic Evaluation Framework of Their Input Scrutiny Ability

Paper • 2508.04017 • Published Aug 6, 2025 • 11
Deep Think with Confidence

Paper • 2508.15260 • Published Aug 21, 2025 • 90
PaperRegister: Boosting Flexible-grained Paper Search via Hierarchical Register Indexing

Paper • 2508.11116 • Published Aug 14, 2025 • 22
Efficient Code Embeddings from Code Generation Models

Paper • 2508.21290 • Published Aug 29, 2025 • 19
Model-Task Alignment Drives Distinct RL Outcomes

Paper • 2508.21188 • Published Aug 28, 2025 • 8
Pref-GRPO: Pairwise Preference Reward-based GRPO for Stable Text-to-Image Reinforcement Learning

Paper • 2508.20751 • Published Aug 28, 2025 • 89
UltraMemV2: Memory Networks Scaling to 120B Parameters with Superior Long-Context Learning

Paper • 2508.18756 • Published Aug 26, 2025 • 36
Hermes 4 Technical Report

Paper • 2508.18255 • Published Aug 25, 2025 • 44
StepWiser: Stepwise Generative Judges for Wiser Reasoning

Paper • 2508.19229 • Published Aug 26, 2025 • 20
ThinkDial: An Open Recipe for Controlling Reasoning Effort in Large Language Models

Paper • 2508.18773 • Published Aug 26, 2025 • 16
Neither Valid nor Reliable? Investigating the Use of LLMs as Judges

Paper • 2508.18076 • Published Aug 25, 2025 • 6
InMind: Evaluating LLMs in Capturing and Applying Individual Human Reasoning Styles

Paper • 2508.16072 • Published Aug 22, 2025 • 4
Servant, Stalker, Predator: How An Honest, Helpful, And Harmless (3H) Agent Unlocks Adversarial Skills

Paper • 2508.19500 • Published Aug 27, 2025 • 2
CARFT: Boosting LLM Reasoning via Contrastive Learning with Annotated Chain-of-Thought-based Reinforced Fine-Tuning

Paper • 2508.15868 • Published Aug 21, 2025 • 3
Jailbreaking Commercial Black-Box LLMs with Explicitly Harmful Prompts

Paper • 2508.10390 • Published Aug 14, 2025 • 1
LLaVA-Critic-R1: Your Critic Model is Secretly a Strong Policy Model

Paper • 2509.00676 • Published Aug 31, 2025 • 85
DCPO: Dynamic Clipping Policy Optimization

Paper • 2509.02333 • Published Sep 2, 2025 • 22
When Does Reasoning Matter? A Controlled Study of Reasoning's Contribution to Model Performance

Paper • 2509.22193 • Published Sep 26, 2025 • 38
What Characterizes Effective Reasoning? Revisiting Length, Review, and Structure of CoT

Paper • 2509.19284 • Published Sep 23, 2025 • 23