Stefano Fiorucci's picture

In a Training Loop 🔄

Stefano Fiorucci PRO

anakin87

·

AI & ML interests

Language Models: orchestration, post-training, GRPO, synthetic data... Contributing to Haystack LLM framework 🏗️

Recent Activity

liked a dataset about 22 hours ago

VAGOsolutions/SauerkrautLM-Doom-MultiVec-31k

upvoted an article 6 days ago

ML Intern Takes Our Post-Training Internship Test

reacted to theirpost with ❤️ 6 days ago

A small model that struggled against a random opponent now beats GPT-5-mini at tic-tac-toe I took https://huggingface.co/LiquidAI/LFM2-2.6B and trained it through play. 🧑‍🍳 Here's how: 1️⃣ Build a solid RL env with Verifiers (Prime Intellect) 2️⃣ Generate synthetic data: <200 games sampled from GPT-5-mini playing in the env 3️⃣ SFT warm-up to teach format 4️⃣ Group-based RL (CISPO) against opponents making 20-70% random moves 5️⃣ RL again with stronger opponents (0-25% random moves) + 1.25 temperature to push exploration and shake off suboptimal strategies Done! Beats GPT-5-mini 🏆 --- 🎮 Play against the model: https://huggingface.co/spaces/anakin87/LFM2-2.6B-mr-tictactoe 🤗 Model: https://huggingface.co/anakin87/LFM2-2.6B-mr-tictactoe 📚 Walkthrough/course: https://github.com/anakin87/llm-rl-environments-lil-course 🤗 Dataset and checkpoints: https://huggingface.co/collections/anakin87/lfm2-26b-mr-tic-tac-toe

View all activity

Organizations

anakin87 's collections 5