| language: en | |
| license: mit | |
| # M-1123_newmodels__olmo7b_sft_r1_ct3arg-rl | |
| ## Model Details | |
| - **Training Method**: VeRL Reinforcement Learning (RL) | |
| - **Stage Name**: rl | |
| - **Experiment**: 1123_newmodels__olmo7b_sft_r1_ct3arg | |
| - **RL Framework**: VeRL (Versatile Reinforcement Learning) | |
| ## Training Configuration | |
| ## Experiment Tracking | |
| ๐ **View complete experiment details**: https://huggingface.co/datasets/SkillFactory/D-ExpTracker__1123_newmodels__olmo7b_sft_r1_ct3arg__v1 | |
| ## Usage | |
| ```python | |
| from transformers import AutoTokenizer, AutoModelForCausalLM | |
| tokenizer = AutoTokenizer.from_pretrained("SkillFactory/M-1123_newmodels__olmo7b_sft_r1_ct3arg-rl") | |
| model = AutoModelForCausalLM.from_pretrained("SkillFactory/M-1123_newmodels__olmo7b_sft_r1_ct3arg-rl") | |
| ``` |