Fixed tokenizer.json, so it is equal with LLama-3.1-8B-Instruct's tokenizer.json

#5
Files changed (1) hide show
  1. README.md +16 -110
README.md CHANGED
@@ -1,106 +1,10 @@
1
  ---
2
- language:
3
- - en
4
  license: llama3
5
- library_name: transformers
6
- base_model: meta-llama/Meta-Llama-3.1-8B-Instruct
7
  datasets:
8
  - arcee-ai/EvolKit-20k
9
- model-index:
10
- - name: Llama-3.1-SuperNova-Lite
11
- results:
12
- - task:
13
- type: text-generation
14
- name: Text Generation
15
- dataset:
16
- name: IFEval (0-Shot)
17
- type: HuggingFaceH4/ifeval
18
- args:
19
- num_few_shot: 0
20
- metrics:
21
- - type: inst_level_strict_acc and prompt_level_strict_acc
22
- value: 80.17
23
- name: strict accuracy
24
- source:
25
- url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=arcee-ai/Llama-3.1-SuperNova-Lite
26
- name: Open LLM Leaderboard
27
- - task:
28
- type: text-generation
29
- name: Text Generation
30
- dataset:
31
- name: BBH (3-Shot)
32
- type: BBH
33
- args:
34
- num_few_shot: 3
35
- metrics:
36
- - type: acc_norm
37
- value: 31.57
38
- name: normalized accuracy
39
- source:
40
- url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=arcee-ai/Llama-3.1-SuperNova-Lite
41
- name: Open LLM Leaderboard
42
- - task:
43
- type: text-generation
44
- name: Text Generation
45
- dataset:
46
- name: MATH Lvl 5 (4-Shot)
47
- type: hendrycks/competition_math
48
- args:
49
- num_few_shot: 4
50
- metrics:
51
- - type: exact_match
52
- value: 15.48
53
- name: exact match
54
- source:
55
- url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=arcee-ai/Llama-3.1-SuperNova-Lite
56
- name: Open LLM Leaderboard
57
- - task:
58
- type: text-generation
59
- name: Text Generation
60
- dataset:
61
- name: GPQA (0-shot)
62
- type: Idavidrein/gpqa
63
- args:
64
- num_few_shot: 0
65
- metrics:
66
- - type: acc_norm
67
- value: 7.49
68
- name: acc_norm
69
- source:
70
- url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=arcee-ai/Llama-3.1-SuperNova-Lite
71
- name: Open LLM Leaderboard
72
- - task:
73
- type: text-generation
74
- name: Text Generation
75
- dataset:
76
- name: MuSR (0-shot)
77
- type: TAUR-Lab/MuSR
78
- args:
79
- num_few_shot: 0
80
- metrics:
81
- - type: acc_norm
82
- value: 11.67
83
- name: acc_norm
84
- source:
85
- url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=arcee-ai/Llama-3.1-SuperNova-Lite
86
- name: Open LLM Leaderboard
87
- - task:
88
- type: text-generation
89
- name: Text Generation
90
- dataset:
91
- name: MMLU-PRO (5-shot)
92
- type: TIGER-Lab/MMLU-Pro
93
- config: main
94
- split: test
95
- args:
96
- num_few_shot: 5
97
- metrics:
98
- - type: acc
99
- value: 31.97
100
- name: accuracy
101
- source:
102
- url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=arcee-ai/Llama-3.1-SuperNova-Lite
103
- name: Open LLM Leaderboard
104
  ---
105
  <div align="center">
106
  <img src="https://i.ibb.co/r072p7j/eopi-ZVu-SQ0-G-Cav78-Byq-Tg.png" alt="Llama-3.1-SuperNova-Lite" style="border-radius: 10px; box-shadow: 0 4px 8px 0 rgba(0, 0, 0, 0.2), 0 6px 20px 0 rgba(0, 0, 0, 0.19); max-width: 100%; height: auto;">
@@ -114,16 +18,18 @@ The model was trained using a state-of-the-art distillation pipeline and an inst
114
 
115
  Llama-3.1-SuperNova-Lite excels in both benchmark performance and real-world applications, providing the power of large-scale models in a more compact, efficient form ideal for organizations seeking high performance with reduced resource requirements.
116
 
117
- # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
118
- Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_arcee-ai__Llama-3.1-SuperNova-Lite)
 
 
 
 
 
 
 
 
119
 
120
- | Metric |Value|
121
- |-------------------|----:|
122
- |Avg. |29.73|
123
- |IFEval (0-Shot) |80.17|
124
- |BBH (3-Shot) |31.57|
125
- |MATH Lvl 5 (4-Shot)|15.48|
126
- |GPQA (0-shot) | 7.49|
127
- |MuSR (0-shot) |11.67|
128
- |MMLU-PRO (5-shot) |31.97|
129
 
 
 
 
1
  ---
 
 
2
  license: llama3
 
 
3
  datasets:
4
  - arcee-ai/EvolKit-20k
5
+ language:
6
+ - en
7
+ base_model: meta-llama/Meta-Llama-3.1-8B-Instruct
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8
  ---
9
  <div align="center">
10
  <img src="https://i.ibb.co/r072p7j/eopi-ZVu-SQ0-G-Cav78-Byq-Tg.png" alt="Llama-3.1-SuperNova-Lite" style="border-radius: 10px; box-shadow: 0 4px 8px 0 rgba(0, 0, 0, 0.2), 0 6px 20px 0 rgba(0, 0, 0, 0.19); max-width: 100%; height: auto;">
 
18
 
19
  Llama-3.1-SuperNova-Lite excels in both benchmark performance and real-world applications, providing the power of large-scale models in a more compact, efficient form ideal for organizations seeking high performance with reduced resource requirements.
20
 
21
+ # Evaluations
22
+ We will be submitting this model to the OpenLLM Leaderboard for a more conclusive benchmark - but here are our internal benchmarks using the main branch of lm evaluation harness:
23
+
24
+ | Benchmark | SuperNova-Lite | Llama-3.1-8b-Instruct |
25
+ |-------------|----------------|----------------------|
26
+ | IF_Eval | 81.1 | 77.4 |
27
+ | MMLU Pro | 38.7 | 37.7 |
28
+ | TruthfulQA | 64.4 | 55.0 |
29
+ | BBH | 51.1 | 50.6 |
30
+ | GPQA | 31.2 | 29.02 |
31
 
32
+ The script used for evaluation can be found inside this repository under /eval.sh, or click [here](https://huggingface.co/arcee-ai/Llama-3.1-SuperNova-Lite/blob/main/eval.sh)
 
 
 
 
 
 
 
 
33
 
34
+ # note
35
+ This readme will be edited regularly on September 10, 2024 (the day of release). After the final readme is in place we will remove this note.