LiveCodeBench evaluation

#100
by wasiuddina - opened

Has the model evaluated on LiveCodeBench? Can you report the setting and the official score (pass@1)? Thanks!

Have you tested it ?I locally deployed the 120b model but found that the score is really low(about 60 on v6),and I also found that the reasoning: medium setting is better than reasoning: high, it is wired. Can anyone explain?the tempeture is 0.6, top-p is 1.0, top-k is 40, max_model_len is 128k

Sign up or log in to comment