SlopCodeBench: Benchmarking How Coding Agents Degrade Over Long-Horizon Iterative Tasks Paper • 2603.24755 • Published 6 days ago • 25
nick11roberts/co-emerge-overtrained-rw-params37M_maxstep219586-flop_2_56e19_step_219586 56.5M • Updated 10 days ago • 119
nick11roberts/co-emerge-overtrained-rw-params37M_maxstep219586-flop_2_56e19_step_219586 56.5M • Updated 10 days ago • 119
nick11roberts/co-emerge-overtrained-rw-params84M_maxstep95981-flop_2_56e19_step_95981 0.1B • Updated 10 days ago • 119
nick11roberts/co-emerge-overtrained-rw-params84M_maxstep95981-flop_2_56e19_step_95981 0.1B • Updated 10 days ago • 119
nick11roberts/co-emerge-overtrained-rw-params149M_maxstep58415-flop_2_56e19_step_58415 0.2B • Updated 10 days ago • 123
nick11roberts/co-emerge-overtrained-rw-params149M_maxstep58415-flop_2_56e19_step_58415 0.2B • Updated 10 days ago • 123
nick11roberts/co-emerge-overtrained-rw-params9M_maxstep14128-flop_4_00e17_step_14128 17.9M • Updated 14 days ago • 136
nick11roberts/co-emerge-overtrained-rw-params9M_maxstep14128-flop_4_00e17_step_14128 17.9M • Updated 14 days ago • 136
nick11roberts/co-emerge-overtrained-rw-params7M_maxstep18165-flop_4_00e17_step_18165 14M • Updated 14 days ago • 136
nick11roberts/co-emerge-overtrained-rw-params7M_maxstep18165-flop_4_00e17_step_18165 14M • Updated 14 days ago • 136
nick11roberts/co-emerge-overtrained-rw-params22M_maxstep5779-flop_4_00e17_step_5779 37M • Updated 14 days ago • 135
nick11roberts/co-emerge-overtrained-rw-params22M_maxstep5779-flop_4_00e17_step_5779 37M • Updated 14 days ago • 135