neutrino12 's Collections Language
updated
Snowflake/Arctic-Text2SQL-R1-7B
8B • Updated
• 3.52k
• 61
Reflect, Retry, Reward: Self-Improving LLMs via Reinforcement Learning
Paper
• 2505.24726
• Published
• 277
Reinforcement Pre-Training
Paper
• 2506.08007
• Published
• 263
Drag-and-Drop LLMs: Zero-Shot Prompt-to-Weights
Paper
• 2506.16406
• Published
• 131
Confidence Is All You Need: Few-Shot RL Fine-Tuning of Language Models
Paper
• 2506.06395
• Published
• 133
Paper
• 2505.09388
• Published
• 334
QwenLong-L1: Towards Long-Context Large Reasoning Models with
Reinforcement Learning
Paper
• 2505.17667
• Published
• 88
Beyond Context Limits: Subconscious Threads for Long-Horizon Reasoning
Paper
• 2507.16784
• Published
• 122
Group Sequence Policy Optimization
Paper
• 2507.18071
• Published
• 317
nablaNABLA: Neighborhood Adaptive Block-Level Attention
Paper
• 2507.13546
• Published
• 125
Paper
• 2507.15493
• Published
• 47
MUR: Momentum Uncertainty guided Reasoning for Large Language Models
Paper
• 2507.14958
• Published
• 47
LAPO: Internalizing Reasoning Efficiency via Length-Adaptive Policy
Optimization
Paper
• 2507.15758
• Published
• 35
Complex Logical Instruction Generation
Paper
• 2508.09125
• Published
• 40
SONAR-LLM: Autoregressive Transformer that Thinks in Sentence Embeddings
and Speaks in Tokens
Paper
• 2508.05305
• Published
• 47
PRELUDE: A Benchmark Designed to Require Global Comprehension and
Reasoning over Long Contexts
Paper
• 2508.09848
• Published
• 71
Train Long, Think Short: Curriculum Learning for Efficient Reasoning
Paper
• 2508.08940
• Published
• 27
Learning to Align, Aligning to Learn: A Unified Approach for
Self-Optimized Alignment
Paper
• 2508.07750
• Published
• 21
Pruning the Unsurprising: Efficient Code Reasoning via First-Token
Surprisal
Paper
• 2508.05988
• Published
• 21
Less Is More: Training-Free Sparse Attention with Global Locality for
Efficient Reasoning
Paper
• 2508.07101
• Published
• 14
Compressing Chain-of-Thought in LLMs via Step Entropy
Paper
• 2508.03346
• Published
• 8
Sample More to Think Less: Group Filtered Policy Optimization for
Concise Reasoning
Paper
• 2508.09726
• Published
• 15
Is Chain-of-Thought Reasoning of LLMs a Mirage? A Data Distribution Lens
Paper
• 2508.01191
• Published
• 238
On the Generalization of SFT: A Reinforcement Learning Perspective with
Reward Rectification
Paper
• 2508.05629
• Published
• 183
R-Zero: Self-Evolving Reasoning LLM from Zero Data
Paper
• 2508.05004
• Published
• 130
Story2Board: A Training-Free Approach for Expressive Storyboard
Generation
Paper
• 2508.09983
• Published
• 70
Don't Overthink It: A Survey of Efficient R1-style Large Reasoning
Models
Paper
• 2508.02120
• Published
• 20
Beyond the Trade-off: Self-Supervised Reinforcement Learning for
Reasoning Models' Instruction Following
Paper
• 2508.02150
• Published
• 37
Trainable Dynamic Mask Sparse Attention
Paper
• 2508.02124
• Published
• 19
Can Large Multimodal Models Actively Recognize Faulty Inputs? A
Systematic Evaluation Framework of Their Input Scrutiny Ability
Paper
• 2508.04017
• Published
• 11
Deep Think with Confidence
Paper
• 2508.15260
• Published
• 90
PaperRegister: Boosting Flexible-grained Paper Search via Hierarchical
Register Indexing
Paper
• 2508.11116
• Published
• 22
Efficient Code Embeddings from Code Generation Models
Paper
• 2508.21290
• Published
• 19
Model-Task Alignment Drives Distinct RL Outcomes
Paper
• 2508.21188
• Published
• 8
Pref-GRPO: Pairwise Preference Reward-based GRPO for Stable
Text-to-Image Reinforcement Learning
Paper
• 2508.20751
• Published
• 89
UltraMemV2: Memory Networks Scaling to 120B Parameters with Superior
Long-Context Learning
Paper
• 2508.18756
• Published
• 36
Hermes 4 Technical Report
Paper
• 2508.18255
• Published
• 44
StepWiser: Stepwise Generative Judges for Wiser Reasoning
Paper
• 2508.19229
• Published
• 20
ThinkDial: An Open Recipe for Controlling Reasoning Effort in Large
Language Models
Paper
• 2508.18773
• Published
• 16
Neither Valid nor Reliable? Investigating the Use of LLMs as Judges
Paper
• 2508.18076
• Published
• 6
InMind: Evaluating LLMs in Capturing and Applying Individual Human
Reasoning Styles
Paper
• 2508.16072
• Published
• 4
Servant, Stalker, Predator: How An Honest, Helpful, And Harmless (3H)
Agent Unlocks Adversarial Skills
Paper
• 2508.19500
• Published
• 2
CARFT: Boosting LLM Reasoning via Contrastive Learning with Annotated
Chain-of-Thought-based Reinforced Fine-Tuning
Paper
• 2508.15868
• Published
• 3
Jailbreaking Commercial Black-Box LLMs with Explicitly Harmful Prompts
Paper
• 2508.10390
• Published
• 1
LLaVA-Critic-R1: Your Critic Model is Secretly a Strong Policy Model
Paper
• 2509.00676
• Published
• 85
DCPO: Dynamic Clipping Policy Optimization
Paper
• 2509.02333
• Published
• 22
When Does Reasoning Matter? A Controlled Study of Reasoning's
Contribution to Model Performance
Paper
• 2509.22193
• Published
• 38
What Characterizes Effective Reasoning? Revisiting Length, Review, and
Structure of CoT
Paper
• 2509.19284
• Published
• 23