LiveCodeBench evaluation
#100
by
wasiuddina
- opened
Has the model evaluated on LiveCodeBench? Can you report the setting and the official score (pass@1)? Thanks!
Have you tested it ?I locally deployed the 120b model but found that the score is really low(about 60 on v6),and I also found that the reasoning: medium setting is better than reasoning: high, it is wired. Can anyone explain?the tempeture is 0.6, top-p is 1.0, top-k is 40, max_model_len is 128k