dfuhoiysOHSVFh82934gfjklb

huba-buba

AI & ML interests

None yet

Recent Activity

upvoted a paper about 6 hours ago

F-GRPO: Don't Let Your Policy Learn the Obvious and Forget the Rare

upvoted a paper 1 day ago

WideSeek-R1: Exploring Width Scaling for Broad Information Seeking via Multi-Agent Reinforcement Learning

upvoted a paper 1 day ago

Dr. Kernel: Reinforcement Learning Done Right for Triton Kernel Generations

View all activity

Organizations

None yet

upvoted a paper about 6 hours ago

F-GRPO: Don't Let Your Policy Learn the Obvious and Forget the Rare

Paper • 2602.06717 • Published 3 days ago • 28

upvoted 3 papers 1 day ago

WideSeek-R1: Exploring Width Scaling for Broad Information Seeking via Multi-Agent Reinforcement Learning

Paper • 2602.04634 • Published 5 days ago • 88

Dr. Kernel: Reinforcement Learning Done Right for Triton Kernel Generations

Paper • 2602.05885 • Published 4 days ago • 25

Reinforcement World Model Learning for LLM-based Agents

Paper • 2602.05842 • Published 4 days ago • 19

upvoted a paper 4 days ago

No One-Size-Fits-All: Building Systems For Translation to Bashkir, Kazakh, Kyrgyz, Tatar and Chuvash Using Synthetic And Original Data

Paper • 2602.04442 • Published 5 days ago • 3

upvoted an article 6 days ago

Article

🐯 Liger GRPO meets TRL

May 25, 2025

•

upvoted 3 papers 6 days ago

upvoted a paper 9 days ago

VisGym: Diverse, Customizable, Scalable Environments for Multimodal Agents

Paper • 2601.16973 • Published 17 days ago • 40

upvoted an article 10 days ago

Article

Small Language Models (SLM): A Comprehensive Overview

Feb 22, 2025

•

129

upvoted an article 13 days ago

Article

Mixture of Experts Explained

Dec 11, 2023

•

1.06k

upvoted an article 24 days ago

Article

The Engineering Handbook for GRPO + LoRA with Verl: Training Qwen2.5 on Multi-GPU

Jan 2

•

upvoted an article 25 days ago

Article

From GRPO to DAPO and GSPO: What, Why, and How

Aug 9, 2025

•

upvoted a paper about 1 month ago

VAR RL Done Right: Tackling Asynchronous Policy Conflicts in Visual Autoregressive Generation

Paper • 2601.02256 • Published Jan 5 • 33

upvoted a paper about 2 months ago

Universal Reasoning Model

Paper • 2512.14693 • Published Dec 16, 2025 • 43

upvoted a collection about 2 months ago

Awesome SFT datasets

Collection

A curated list of interesting datasets to fine-tune language models with. • 43 items • Updated Apr 12, 2024 • 148

upvoted 2 papers 2 months ago

From Code Foundation Models to Agents and Applications: A Practical Guide to Code Intelligence

Paper • 2511.18538 • Published Nov 23, 2025 • 296

DeepSeekMath-V2: Towards Self-Verifiable Mathematical Reasoning

Paper • 2511.22570 • Published Nov 27, 2025 • 90

upvoted a paper 3 months ago

GigaEvo: An Open Source Optimization Framework Powered By LLMs And Evolution Algorithms

Paper • 2511.17592 • Published Nov 17, 2025 • 119

dfuhoiysOHSVFh82934gfjklb

AI & ML interests

Recent Activity

Organizations

huba-buba's activity

🐯 Liger GRPO meets TRL

Small Language Models (SLM): A Comprehensive Overview

Mixture of Experts Explained

The Engineering Handbook for GRPO + LoRA with Verl: Training Qwen2.5 on Multi-GPU

From GRPO to DAPO and GSPO: What, Why, and How