xinchen9 leaderboard-pr-bot commited on
Commit
069e7a7
1 Parent(s): e4da730

Adding Evaluation Results (#1)

Browse files

- Adding Evaluation Results (c9fe3573d1494a9caa9f3edc1d6b43818506ed9d)


Co-authored-by: Open LLM Leaderboard PR Bot <[email protected]>

Files changed (1) hide show
  1. README.md +109 -1
README.md CHANGED
@@ -1,5 +1,100 @@
1
  ---
2
  license: apache-2.0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3
  ---
4
  ### 1. Model Details
5
  Introducing xinchen9/llama3-b8-ft, an advanced language model comprising 8 billion parameters. It has been fine-trained based on
@@ -23,4 +118,17 @@ model.generation_config = GenerationConfig.from_pretrained(model_name)
23
  model.generation_config.pad_token_id = model.generation_config.eos_token_id
24
  ```
25
  ### 3 Disclaimer
26
- The license on this model does not constitute legal advice. We are not responsible for the actions of third parties who use this model. Please cosult an attorney before using this model for commercial purposes.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: apache-2.0
3
+ model-index:
4
+ - name: llama3-b8-ft-dis
5
+ results:
6
+ - task:
7
+ type: text-generation
8
+ name: Text Generation
9
+ dataset:
10
+ name: IFEval (0-Shot)
11
+ type: HuggingFaceH4/ifeval
12
+ args:
13
+ num_few_shot: 0
14
+ metrics:
15
+ - type: inst_level_strict_acc and prompt_level_strict_acc
16
+ value: 15.46
17
+ name: strict accuracy
18
+ source:
19
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=xinchen9/llama3-b8-ft-dis
20
+ name: Open LLM Leaderboard
21
+ - task:
22
+ type: text-generation
23
+ name: Text Generation
24
+ dataset:
25
+ name: BBH (3-Shot)
26
+ type: BBH
27
+ args:
28
+ num_few_shot: 3
29
+ metrics:
30
+ - type: acc_norm
31
+ value: 24.73
32
+ name: normalized accuracy
33
+ source:
34
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=xinchen9/llama3-b8-ft-dis
35
+ name: Open LLM Leaderboard
36
+ - task:
37
+ type: text-generation
38
+ name: Text Generation
39
+ dataset:
40
+ name: MATH Lvl 5 (4-Shot)
41
+ type: hendrycks/competition_math
42
+ args:
43
+ num_few_shot: 4
44
+ metrics:
45
+ - type: exact_match
46
+ value: 3.17
47
+ name: exact match
48
+ source:
49
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=xinchen9/llama3-b8-ft-dis
50
+ name: Open LLM Leaderboard
51
+ - task:
52
+ type: text-generation
53
+ name: Text Generation
54
+ dataset:
55
+ name: GPQA (0-shot)
56
+ type: Idavidrein/gpqa
57
+ args:
58
+ num_few_shot: 0
59
+ metrics:
60
+ - type: acc_norm
61
+ value: 8.39
62
+ name: acc_norm
63
+ source:
64
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=xinchen9/llama3-b8-ft-dis
65
+ name: Open LLM Leaderboard
66
+ - task:
67
+ type: text-generation
68
+ name: Text Generation
69
+ dataset:
70
+ name: MuSR (0-shot)
71
+ type: TAUR-Lab/MuSR
72
+ args:
73
+ num_few_shot: 0
74
+ metrics:
75
+ - type: acc_norm
76
+ value: 6.41
77
+ name: acc_norm
78
+ source:
79
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=xinchen9/llama3-b8-ft-dis
80
+ name: Open LLM Leaderboard
81
+ - task:
82
+ type: text-generation
83
+ name: Text Generation
84
+ dataset:
85
+ name: MMLU-PRO (5-shot)
86
+ type: TIGER-Lab/MMLU-Pro
87
+ config: main
88
+ split: test
89
+ args:
90
+ num_few_shot: 5
91
+ metrics:
92
+ - type: acc
93
+ value: 24.93
94
+ name: accuracy
95
+ source:
96
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=xinchen9/llama3-b8-ft-dis
97
+ name: Open LLM Leaderboard
98
  ---
99
  ### 1. Model Details
100
  Introducing xinchen9/llama3-b8-ft, an advanced language model comprising 8 billion parameters. It has been fine-trained based on
 
118
  model.generation_config.pad_token_id = model.generation_config.eos_token_id
119
  ```
120
  ### 3 Disclaimer
121
+ The license on this model does not constitute legal advice. We are not responsible for the actions of third parties who use this model. Please cosult an attorney before using this model for commercial purposes.
122
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
123
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_xinchen9__llama3-b8-ft-dis)
124
+
125
+ | Metric |Value|
126
+ |-------------------|----:|
127
+ |Avg. |13.85|
128
+ |IFEval (0-Shot) |15.46|
129
+ |BBH (3-Shot) |24.73|
130
+ |MATH Lvl 5 (4-Shot)| 3.17|
131
+ |GPQA (0-shot) | 8.39|
132
+ |MuSR (0-shot) | 6.41|
133
+ |MMLU-PRO (5-shot) |24.93|
134
+