LiveCodeBench evaluation

#100

by wasiuddina - opened Aug 10

Discussion

wasiuddina

Aug 10

Has the model evaluated on LiveCodeBench? Can you report the setting and the official score (pass@1)? Thanks!

lsx666

16 days ago

Have you tested it ？I locally deployed the 120b model but found that the score is really low(about 60 on v6)，and I also found that the reasoning: medium setting is better than reasoning: high, it is wired. Can anyone explain?the tempeture is 0.6, top-p is 1.0, top-k is 40, max_model_len is 128k

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment