ceselder
/

llamacle_drgrpo_euan_v1_step30

Model card Files Files and versions

llamacle_drgrpo_euan_v1_step30

DrGRPO RL post-train (30 cycles) on top of euan-loracles/llama70b-loracle-25k. 2,500 held-out FineWeb-edu LoRAs as RL pool, 32 prompts/cycle x K=16 rollouts (sub-batched 4xK=4), lr=7e-6, eps=0.2/0.28, NF4-DDP across 4xB200.

AB Llama-70B (3 seeds, mean ± std)

step	any-match	rollout-mean
0 (euan baseline)	66.1% ± 2.7%	42.4% ± 1.0%
10	70.1% ± 1.0%	47.6% ± 3.0%
20	70.1% ± 4.3%	48.3% ± 2.5%
30 (this ckpt)	72.4% ± 0.0%	50.9% ± 0.7%

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ceselder/llamacle_drgrpo_euan_v1_step30

Base model

meta-llama/Llama-3.1-70B

Finetuned

meta-llama/Llama-3.3-70B-Instruct

Finetuned

(614)

this model