PlanBench-XL: Evaluating Long-Horizon Planning of LLM Tool-Use Agents in Large-Scale Tool Ecosystems Paper • 2606.22388 • Published 2 days ago • 51
Teaching Thinking Models to Reason with Tools: A Full-Pipeline Recipe for Tool-Integrated Reasoning Paper • 2605.06326 • Published May 7 • 26
Towards On-Policy Data Evolution for Visual-Native Multimodal Deep Search Agents Paper • 2605.10832 • Published May 11 • 22
Towards On-Policy Data Evolution for Visual-Native Multimodal Deep Search Agents Paper • 2605.10832 • Published May 11 • 22
Towards On-Policy Data Evolution for Visual-Native Multimodal Deep Search Agents Paper • 2605.10832 • Published May 11 • 22
Claw-Eval-Live: A Live Agent Benchmark for Evolving Real-World Workflows Paper • 2604.28139 • Published Apr 30 • 42
Claw-Eval-Live: A Live Agent Benchmark for Evolving Real-World Workflows Paper • 2604.28139 • Published Apr 30 • 42
Agent-World: Scaling Real-World Environment Synthesis for Evolving General Agent Intelligence Paper • 2604.18292 • Published Apr 20 • 87
Agent-World: Scaling Real-World Environment Synthesis for Evolving General Agent Intelligence Paper • 2604.18292 • Published Apr 20 • 87