clefourrier HF staff lewtun HF staff commited on
Commit
70ef8e1
1 Parent(s): 5392a7c

Update src/display/about.py (#612)

Browse files

- Update src/display/about.py (82488c1ded5da4acc157fc0994d1b229db098cc4)


Co-authored-by: Lewis Tunstall <[email protected]>

Files changed (1) hide show
  1. src/display/about.py +11 -4
src/display/about.py CHANGED
@@ -38,11 +38,18 @@ You can find:
38
  - community queries and running status in the `requests` Hugging Face dataset: https://huggingface.co/datasets/open-llm-leaderboard/requests
39
 
40
  ## Reproducibility
41
- To reproduce our results, here is the commands you can run, using [this version](https://github.com/EleutherAI/lm-evaluation-harness/tree/b281b0921b636bc36ad05c0b0b0763bd6dd43463) of the Eleuther AI Harness:
42
- `python main.py --model=hf-causal-experimental --model_args="pretrained=<your_model>,use_accelerate=True,revision=<your_model_revision>"`
43
- ` --tasks=<task_list> --num_fewshot=<n_few_shot> --batch_size=1 --output_path=<output_path>`
44
 
45
- The total batch size we get for models which fit on one A100 node is 8 (8 GPUs * 1). If you don't use parallelism, adapt your batch size to fit.
 
 
 
 
 
 
 
 
 
46
  *You can expect results to vary slightly for different batch sizes because of padding.*
47
 
48
  The tasks and few shots parameters are:
 
38
  - community queries and running status in the `requests` Hugging Face dataset: https://huggingface.co/datasets/open-llm-leaderboard/requests
39
 
40
  ## Reproducibility
41
+ To reproduce our results, use [this version](https://github.com/EleutherAI/lm-evaluation-harness/tree/b281b0921b636bc36ad05c0b0b0763bd6dd43463) of the Eleuther AI Harness and run:
 
 
42
 
43
+ ```
44
+ python main.py --model=hf-causal-experimental \
45
+ --model_args="pretrained=<your_model>,use_accelerate=True,revision=<your_model_revision>" \
46
+ --tasks=<task_list> \
47
+ --num_fewshot=<n_few_shot> \
48
+ --batch_size=1 \
49
+ --output_path=<output_path>
50
+ ```
51
+
52
+ **Note:** we evaluate all models on a single node of 8 H100s, so the global batch batch size is 8 for each evaluation. If you don't use parallelism, adapt your batch size to fit.
53
  *You can expect results to vary slightly for different batch sizes because of padding.*
54
 
55
  The tasks and few shots parameters are: