leaderboard-pr-bot commited on
Commit
017de26
1 Parent(s): e91ad0a

Adding Evaluation Results

Browse files

This is an automated PR created with https://huggingface.co/spaces/Weyaxi/open-llm-leaderboard-results-pr

The purpose of this PR is to add evaluation results from the Open LLM Leaderboard to your model card.

If you encounter any issues, please report them to https://huggingface.co/spaces/Weyaxi/open-llm-leaderboard-results-pr/discussions

Files changed (1) hide show
  1. README.md +109 -1
README.md CHANGED
@@ -2,10 +2,118 @@
2
  license: apache-2.0
3
  base_model:
4
  - Dans-DiscountModels/mistral-7b-v0.3-ChatML
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5
  ---
6
  This model is an early release of an upcoming model for testing purposes. The format is ChatML. If you use this model let me know how it goes.
7
 
8
  ### Training details:
9
  - 1x RTX 4080
10
  - Rank 64 RSLoRA
11
- - 70 Hours runtime
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2
  license: apache-2.0
3
  base_model:
4
  - Dans-DiscountModels/mistral-7b-v0.3-ChatML
5
+ model-index:
6
+ - name: Mistral-7b-v0.3-Test-E0.7
7
+ results:
8
+ - task:
9
+ type: text-generation
10
+ name: Text Generation
11
+ dataset:
12
+ name: IFEval (0-Shot)
13
+ type: HuggingFaceH4/ifeval
14
+ args:
15
+ num_few_shot: 0
16
+ metrics:
17
+ - type: inst_level_strict_acc and prompt_level_strict_acc
18
+ value: 51.24
19
+ name: strict accuracy
20
+ source:
21
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Dans-DiscountModels/Mistral-7b-v0.3-Test-E0.7
22
+ name: Open LLM Leaderboard
23
+ - task:
24
+ type: text-generation
25
+ name: Text Generation
26
+ dataset:
27
+ name: BBH (3-Shot)
28
+ type: BBH
29
+ args:
30
+ num_few_shot: 3
31
+ metrics:
32
+ - type: acc_norm
33
+ value: 26.82
34
+ name: normalized accuracy
35
+ source:
36
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Dans-DiscountModels/Mistral-7b-v0.3-Test-E0.7
37
+ name: Open LLM Leaderboard
38
+ - task:
39
+ type: text-generation
40
+ name: Text Generation
41
+ dataset:
42
+ name: MATH Lvl 5 (4-Shot)
43
+ type: hendrycks/competition_math
44
+ args:
45
+ num_few_shot: 4
46
+ metrics:
47
+ - type: exact_match
48
+ value: 3.25
49
+ name: exact match
50
+ source:
51
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Dans-DiscountModels/Mistral-7b-v0.3-Test-E0.7
52
+ name: Open LLM Leaderboard
53
+ - task:
54
+ type: text-generation
55
+ name: Text Generation
56
+ dataset:
57
+ name: GPQA (0-shot)
58
+ type: Idavidrein/gpqa
59
+ args:
60
+ num_few_shot: 0
61
+ metrics:
62
+ - type: acc_norm
63
+ value: 6.15
64
+ name: acc_norm
65
+ source:
66
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Dans-DiscountModels/Mistral-7b-v0.3-Test-E0.7
67
+ name: Open LLM Leaderboard
68
+ - task:
69
+ type: text-generation
70
+ name: Text Generation
71
+ dataset:
72
+ name: MuSR (0-shot)
73
+ type: TAUR-Lab/MuSR
74
+ args:
75
+ num_few_shot: 0
76
+ metrics:
77
+ - type: acc_norm
78
+ value: 8.03
79
+ name: acc_norm
80
+ source:
81
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Dans-DiscountModels/Mistral-7b-v0.3-Test-E0.7
82
+ name: Open LLM Leaderboard
83
+ - task:
84
+ type: text-generation
85
+ name: Text Generation
86
+ dataset:
87
+ name: MMLU-PRO (5-shot)
88
+ type: TIGER-Lab/MMLU-Pro
89
+ config: main
90
+ split: test
91
+ args:
92
+ num_few_shot: 5
93
+ metrics:
94
+ - type: acc
95
+ value: 19.38
96
+ name: accuracy
97
+ source:
98
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Dans-DiscountModels/Mistral-7b-v0.3-Test-E0.7
99
+ name: Open LLM Leaderboard
100
  ---
101
  This model is an early release of an upcoming model for testing purposes. The format is ChatML. If you use this model let me know how it goes.
102
 
103
  ### Training details:
104
  - 1x RTX 4080
105
  - Rank 64 RSLoRA
106
+ - 70 Hours runtime
107
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
108
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_Dans-DiscountModels__Mistral-7b-v0.3-Test-E0.7)
109
+
110
+ | Metric |Value|
111
+ |-------------------|----:|
112
+ |Avg. |19.14|
113
+ |IFEval (0-Shot) |51.24|
114
+ |BBH (3-Shot) |26.82|
115
+ |MATH Lvl 5 (4-Shot)| 3.25|
116
+ |GPQA (0-shot) | 6.15|
117
+ |MuSR (0-shot) | 8.03|
118
+ |MMLU-PRO (5-shot) |19.38|
119
+