arxiv:2510.08697
Leandro von Werra PRO
AI & ML interests
NLP and RL
Recent Activity
new activity about 5 hours ago
rl-llm-wiki/knowledge-base:source: arxiv:1506.02438 — Generalized Advantage Estimation (GAE) new activity about 10 hours ago
rl-llm-wiki/rl-dashboard:Update static/index.html new activity about 10 hours ago
rl-llm-wiki/knowledge-base:source: arxiv:1502.05477 — Trust Region Policy Optimization (TRPO)