R-PRM: Reasoning-Driven Process Reward Modeling
Shuaijie She
kevinpro
AI & ML interests
Reasoning, Chain of Thoughts, Alignment, Factual Consistency, Summarization
Organizations
MAPO: Multilingual Reasoning with Preference Optimization
MAPO: Advancing Multilingual Reasoning through Multilingual Alignment‑as‑Preference
Optimization
- RunningAgents5
Open Multilingual Reasoning Leaderboard
🦊5Display and search a leaderboard of math models
-
MAPO: Advancing Multilingual Reasoning through Multilingual Alignment-as-Preference Optimization
Paper • 2401.06838 • Published -
kevinpro/MNumGLUESub
Updated • 3 -
kevinpro/MetaMathOctopus-MAPO-DPO-13B
Text Generation • 13B • Updated • 3
R-PRM
R-PRM: Reasoning-Driven Process Reward Modeling
MAPO: Multilingual Reasoning with Preference Optimization
MAPO: Advancing Multilingual Reasoning through Multilingual Alignment‑as‑Preference
Optimization
- RunningAgents5
Open Multilingual Reasoning Leaderboard
🦊5Display and search a leaderboard of math models
-
MAPO: Advancing Multilingual Reasoning through Multilingual Alignment-as-Preference Optimization
Paper • 2401.06838 • Published -
kevinpro/MNumGLUESub
Updated • 3 -
kevinpro/MetaMathOctopus-MAPO-DPO-13B
Text Generation • 13B • Updated • 3
models 15
kevinpro/R-PRM-7B-DPO
Text Generation • 8B • Updated • 228 • • 3
kevinpro/Hydra-LLaMA3-8B-0531-preview-Q4_K_M-GGUF
Text Generation • 8B • Updated • 2
kevinpro/MistralMathOctopus-7B
Text Generation • 7B • Updated • 12 •
kevinpro/MetaMathOctopus-MAPO-DPO-13B
Text Generation • 13B • Updated • 3
kevinpro/MathOctopus-MAPO-DPO-7B
Text Generation • 7B • Updated • 5
kevinpro/MetaMathOctopus-13B
Text Generation • 13B • Updated • 4
kevinpro/MetaMathOctopus-MAPO-DPO-7B
Text Generation • 7B • Updated • 1
kevinpro/MetaMathOctopus-7B
Text Generation • 7B • Updated • 4
kevinpro/MathOctopus-MAPO-DPO-13B
Text Generation • 13B • Updated • 2
kevinpro/MistralMathOctopus-MAPO-DPO-7B
Text Generation • 7B • Updated • 2