ThinMQM (automated translation evaluation, MQM) model and data collection.
Runzhe Zhan
rzzhan
AI & ML interests
None yet
Recent Activity
upvoted a paper about 20 hours ago
MaxProof: Scaling Mathematical Proof with Generative-Verifier RL and Population-Level Test-Time Scaling upvoted a paper about 1 month ago
π-Bench: Evaluating Proactive Personal Assistant Agents in Long-Horizon Workflows