Residual Stream Duality in Modern Transformer Architectures Paper • 2603.16039 • Published Mar 17 • 4
Residual Stream Duality in Modern Transformer Architectures Paper • 2603.16039 • Published Mar 17 • 4
RLAnything: Forge Environment, Policy, and Reward Model in Completely Dynamic RL System Paper • 2602.02488 • Published Feb 2 • 36
Language Server CLI Empowers Language Agents with Process Rewards Paper • 2510.22907 • Published Oct 27, 2025 • 5
CriticLean: Critic-Guided Reinforcement Learning for Mathematical Formalization Paper • 2507.06181 • Published Jul 8, 2025 • 45
On the Design of KL-Regularized Policy Gradient Algorithms for LLM Reasoning Paper • 2505.17508 • Published May 23, 2025 • 8