GSM8K results replication
Hi!
I'm trying to replicate the gsm8k result, using lighteval:
lighteval accelerate \
--model_args "pretrained=HuggingFaceTB/SmolLM2-1.7B-Instruct,max_gen_toks=800,max_length=2000,dtype=bfloat16" \
--tasks "lighteval|gsm8k|5|1" \
--override_batch_size 1 \
--output_dir="./evals/"
It doesn't seem to be working for me. These are the results:
{
"config_general": {
"lighteval_sha": "?",
"num_fewshot_seeds": 1,
"override_batch_size": 1,
"max_samples": null,
"job_id": "",
"start_time": 11280063.201784864,
"end_time": 11281750.641923392,
"total_evaluation_time_secondes": "1687.4401385281235",
"model_name": "HuggingFaceTB/SmolLM2-1.7B-Instruct",
"model_sha": "7eb5a4069bde2ddf31c4303463d32e445d3e7d45",
"model_dtype": "torch.bfloat16",
"model_size": "3.19 GB"
},
"results": {
"lighteval|gsm8k|5": {
"maj@8": 0.001516300227445034,
"maj@8_stderr": 0.0010717793485492638,
"qem": 0.0,
"qem_stderr": 0.0
},
"all": {
"maj@8": 0.001516300227445034,
"maj@8_stderr": 0.0010717793485492638,
"qem": 0.0,
"qem_stderr": 0.0
}
}
Any idea what I should be doing instead?
Hello
@sam-paech
, the default pytorch batching in lighteval cuts off the generations at a single token for most batches if you set the model length to 2000 (due to some of the longer 5-shot prompts).
We're using VLLM's dynamic batching to overcome that: https://github.com/huggingface/smollm/blob/main/evaluation/README.md
Hello @sam-paech , the default pytorch batching in lighteval cuts off the generations at a single token for most batches if you set the model length to 2000 (due to some of the longer 5-shot prompts).
We're using VLLM's dynamic batching to overcome that: https://github.com/huggingface/smollm/blob/main/evaluation/README.md
Great thanks Anton, will try this.