pangpangxuan

pangxuan

AI & ML interests

None yet

Recent Activity

upvoted a paper 2 days ago

Learning beyond Teacher: Generalized On-Policy Distillation with Reward Extrapolation

upvoted a paper 2 days ago

The Devil Behind Moltbook: Anthropic Safety is Always Vanishing in Self-Evolving AI Societies

upvoted a paper 6 days ago

WideSeek-R1: Exploring Width Scaling for Broad Information Seeking via Multi-Agent Reinforcement Learning

View all activity

Organizations

None yet

upvoted 2 papers 2 days ago

Learning beyond Teacher: Generalized On-Policy Distillation with Reward Extrapolation

Paper • 2602.12125 • Published 3 days ago • 56

The Devil Behind Moltbook: Anthropic Safety is Always Vanishing in Self-Evolving AI Societies

Paper • 2602.09877 • Published 5 days ago • 183

upvoted a paper 6 days ago

WideSeek-R1: Exploring Width Scaling for Broad Information Seeking via Multi-Agent Reinforcement Learning

Paper • 2602.04634 • Published 11 days ago • 92

upvoted a paper 9 days ago

CAR-bench: Evaluating the Consistency and Limit-Awareness of LLM Agents under Real-World Uncertainty

Paper • 2601.22027 • Published 17 days ago • 79

upvoted 3 papers 11 days ago

MMFineReason: Closing the Multimodal Reasoning Gap via Open Data-Centric Methods

Paper • 2601.21821 • Published 17 days ago • 59

TTCS: Test-Time Curriculum Synthesis for Self-Evolving

Paper • 2601.22628 • Published 16 days ago • 34

Kimi K2.5: Visual Agentic Intelligence

Paper • 2602.02276 • Published 13 days ago • 231

upvoted 13 papers about 1 month ago

X-Coder: Advancing Competitive Programming with Fully Synthetic Tasks, Solutions, and Tests

Paper • 2601.06953 • Published Jan 11 • 44

RL-AWB: Deep Reinforcement Learning for Auto White Balance Correction in Low-Light Night-time Scenes

Paper • 2601.05249 • Published Jan 8 • 46

Thinking with Map: Reinforced Parallel Map-Augmented Agent for Geolocalization

Paper • 2601.05432 • Published Jan 8 • 166

The Molecular Structure of Thought: Mapping the Topology of Long Chain-of-Thought Reasoning

Paper • 2601.06002 • Published Jan 9 • 53

Chaining the Evidence: Robust Reinforcement Learning for Deep Search Agents with Citation-Aware Rubric Rewards

Paper • 2601.06021 • Published Jan 9 • 46

GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization

Paper • 2601.05242 • Published Jan 8 • 225

AT^2PO: Agentic Turn-based Policy Optimization via Tree Search

Paper • 2601.04767 • Published Jan 8 • 28

Benchmark^2: Systematic Evaluation of LLM Benchmarks

Paper • 2601.03986 • Published Jan 7 • 34

Youtu-Agent: Scaling Agent Productivity with Automated Generation and Hybrid Policy Optimization

Paper • 2512.24615 • Published Dec 31, 2025 • 119

Evaluating Parameter Efficient Methods for RLVR

Paper • 2512.23165 • Published Dec 29, 2025 • 27

Youtu-LLM: Unlocking the Native Agentic Potential for Lightweight Large Language Models

Paper • 2512.24618 • Published Dec 31, 2025 • 151

Improving Multi-step RAG with Hypergraph-based Memory for Long-Context Complex Relational Modeling

Paper • 2512.23959 • Published Dec 30, 2025 • 112

Dynamic Large Concept Models: Latent Reasoning in an Adaptive Semantic Space

Paper • 2512.24617 • Published Dec 31, 2025 • 64

pangpangxuan

AI & ML interests

Recent Activity

Organizations

pangxuan's activity