Rui Yang's picture

In a Training Loop 🔄

Rui Yang PRO

Ray2333

·

https://yangrui2015.github.io

YangRui2015

AI & ML interests

Deep Reinforcement Learning

Recent Activity

liked a dataset about 6 hours ago

ReCAP-Agent/ReCAP-187k-SFT

liked a model about 6 hours ago

ReCAP-Agent/ReCAP-32B

liked a model about 6 hours ago

ReCAP-Agent/ReCAP-8B

View all activity

Organizations

Collections 2

Papers 8

arxiv:2602.22190

arxiv:2510.27623

arxiv:2510.12693

arxiv:2506.03143

models 16

Ray2333/GUI-Libra-3B

4B • Updated 22 days ago • 26

Ray2333/GRM-Llama3.2-3B-rewardmodel-ft

Text Classification • 3B • Updated Apr 30, 2025 • 1.39k • 13

Ray2333/Gemma-2B-rewardmodel-baseline

Text Classification • 3B • Updated Feb 5, 2025 • 561 • 2

Ray2333/GRM-llama3-8B-distill

Text Classification • 8B • Updated Feb 5, 2025 • 391 • 6

Ray2333/reward-model-Mistral-7B-instruct-Unified-Feedback

Text Classification • 7B • Updated Feb 5, 2025 • 124 • 11

Ray2333/GRM-Gemma-2B-rewardmodel-ft

3B • Updated Feb 5, 2025 • 37 • 1

Ray2333/Gemma-2B-rewardmodel-ft

3B • Updated Feb 5, 2025 • 21 • 1

Ray2333/GRM-llama3.2-3B-sftreg

Text Classification • 3B • Updated Feb 5, 2025 • 1 • 1

Ray2333/GRM-Gemma-2B-sftreg

Text Classification • 3B • Updated Feb 5, 2025 • 1.03k • 3

Ray2333/GRM-llama3-8B-sftreg

Text Classification • 8B • Updated Feb 5, 2025 • 17 • 5

datasets 4

Ray2333/Offline_Evaluation

Viewer • Updated about 1 month ago • 35.2k • 12

Ray2333/Libra-81K-SFT

Updated Feb 20 • 46

Ray2333/Libra-81K

Viewer • Updated Feb 20 • 738 • 13

Ray2333/RiC_harmless_helpful

Viewer • Updated Jul 12, 2024 • 291k • 100