sethuiyer commited on
Commit
e383397
1 Parent(s): 1d4f46e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +12 -12
README.md CHANGED
@@ -42,7 +42,7 @@ model-index:
42
  num_few_shot: 3
43
  metrics:
44
  - type: acc_norm
45
- value: 54.33
46
  name: normalized accuracy
47
  source:
48
  url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=sethuiyer/Qwen2.5-7B-Anvita
@@ -57,7 +57,7 @@ model-index:
57
  num_few_shot: 4
58
  metrics:
59
  - type: exact_match
60
- value: 18.32
61
  name: exact match
62
  source:
63
  url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=sethuiyer/Qwen2.5-7B-Anvita
@@ -72,7 +72,7 @@ model-index:
72
  num_few_shot: 0
73
  metrics:
74
  - type: acc_norm
75
- value: 32.72
76
  name: acc_norm
77
  source:
78
  url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=sethuiyer/Qwen2.5-7B-Anvita
@@ -87,7 +87,7 @@ model-index:
87
  num_few_shot: 0
88
  metrics:
89
  - type: acc_norm
90
- value: 43.25
91
  name: acc_norm
92
  source:
93
  url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=sethuiyer/Qwen2.5-7B-Anvita
@@ -104,7 +104,7 @@ model-index:
104
  num_few_shot: 5
105
  metrics:
106
  - type: acc
107
- value: 41.66
108
  name: accuracy
109
  source:
110
  url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=sethuiyer/Qwen2.5-7B-Anvita
@@ -131,13 +131,13 @@ This combination optimizes Anvita for superior reasoning, dynamic conversations,
131
  ## Evaluation Results
132
  | **Metric** | **Value** |
133
  |-------------------------|--------------:|
134
- | **Avg.** | **44.3** |
135
- | **IFEval (0-Shot)** | 64.33 |
136
- | **BBH (3-Shot)** | 54.33 |
137
- | **MATH Level 5 (4-Shot)**| 18.32 |
138
- | **GPQA (0-Shot)** | 32.72 |
139
- | **MuSR (0-Shot)** | 43.25 |
140
- | **MMLU-PRO (5-Shot)** | 41.66 |
141
 
142
  Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/sethuiyer__Qwen2.5-7B-Anvita-details).
143
  Personal Benchmarks - check [PERSONAL_BENCHMARK.md](./PERSONAL_BENCHMARK.md)
 
42
  num_few_shot: 3
43
  metrics:
44
  - type: acc_norm
45
+ value: 35.48
46
  name: normalized accuracy
47
  source:
48
  url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=sethuiyer/Qwen2.5-7B-Anvita
 
57
  num_few_shot: 4
58
  metrics:
59
  - type: exact_match
60
+ value: 15.86
61
  name: exact match
62
  source:
63
  url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=sethuiyer/Qwen2.5-7B-Anvita
 
72
  num_few_shot: 0
73
  metrics:
74
  - type: acc_norm
75
+ value: 10.29
76
  name: acc_norm
77
  source:
78
  url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=sethuiyer/Qwen2.5-7B-Anvita
 
87
  num_few_shot: 0
88
  metrics:
89
  - type: acc_norm
90
+ value: 13.47
91
  name: acc_norm
92
  source:
93
  url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=sethuiyer/Qwen2.5-7B-Anvita
 
104
  num_few_shot: 5
105
  metrics:
106
  - type: acc
107
+ value: 35.17
108
  name: accuracy
109
  source:
110
  url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=sethuiyer/Qwen2.5-7B-Anvita
 
131
  ## Evaluation Results
132
  | **Metric** | **Value** |
133
  |-------------------------|--------------:|
134
+ | **Avg.** | **29.18** |
135
+ | **IFEval (0-Shot)** | 64.8 |
136
+ | **BBH (3-Shot)** | 35.48 |
137
+ | **MATH Level 5 (4-Shot)**| 15.86 |
138
+ | **GPQA (0-Shot)** | 10.29 |
139
+ | **MuSR (0-Shot)** | 13.47 |
140
+ | **MMLU-PRO (5-Shot)** | 35.17 |
141
 
142
  Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/sethuiyer__Qwen2.5-7B-Anvita-details).
143
  Personal Benchmarks - check [PERSONAL_BENCHMARK.md](./PERSONAL_BENCHMARK.md)