sethuiyer
/

Qwen2.5-7B-Anvita

@@ -42,7 +42,7 @@ model-index:
         num_few_shot: 3
     metrics:
     - type: acc_norm
-      value: 54.33
       name: normalized accuracy
     source:
       url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=sethuiyer/Qwen2.5-7B-Anvita
@@ -57,7 +57,7 @@ model-index:
         num_few_shot: 4
     metrics:
     - type: exact_match
-      value: 18.32
       name: exact match
     source:
       url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=sethuiyer/Qwen2.5-7B-Anvita
@@ -72,7 +72,7 @@ model-index:
         num_few_shot: 0
     metrics:
     - type: acc_norm
-      value: 32.72
       name: acc_norm
     source:
       url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=sethuiyer/Qwen2.5-7B-Anvita
@@ -87,7 +87,7 @@ model-index:
         num_few_shot: 0
     metrics:
     - type: acc_norm
-      value: 43.25
       name: acc_norm
     source:
       url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=sethuiyer/Qwen2.5-7B-Anvita
@@ -104,7 +104,7 @@ model-index:
         num_few_shot: 5
     metrics:
     - type: acc
-      value: 41.66
       name: accuracy
     source:
       url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=sethuiyer/Qwen2.5-7B-Anvita
@@ -131,13 +131,13 @@ This combination optimizes Anvita for superior reasoning, dynamic conversations,
 ## Evaluation Results
 | **Metric**              | **Value** |
 |-------------------------|--------------:|
-| **Avg.**                | **44.3**      |
-| **IFEval (0-Shot)**      | 64.33         |
-| **BBH (3-Shot)**         | 54.33         |
-| **MATH Level 5 (4-Shot)**| 18.32         |
-| **GPQA (0-Shot)**        | 32.72         |
-| **MuSR (0-Shot)**        | 43.25         |
-| **MMLU-PRO (5-Shot)**    | 41.66         |
 Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/sethuiyer__Qwen2.5-7B-Anvita-details).
 Personal Benchmarks - check [PERSONAL_BENCHMARK.md](./PERSONAL_BENCHMARK.md)

         num_few_shot: 3
     metrics:
     - type: acc_norm
+      value: 35.48
       name: normalized accuracy
     source:
       url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=sethuiyer/Qwen2.5-7B-Anvita
         num_few_shot: 4
     metrics:
     - type: exact_match
+      value: 15.86
       name: exact match
     source:
       url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=sethuiyer/Qwen2.5-7B-Anvita
         num_few_shot: 0
     metrics:
     - type: acc_norm
+      value: 10.29
       name: acc_norm
     source:
       url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=sethuiyer/Qwen2.5-7B-Anvita
         num_few_shot: 0
     metrics:
     - type: acc_norm
+      value: 13.47
       name: acc_norm
     source:
       url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=sethuiyer/Qwen2.5-7B-Anvita
         num_few_shot: 5
     metrics:
     - type: acc
+      value: 35.17
       name: accuracy
     source:
       url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=sethuiyer/Qwen2.5-7B-Anvita
 ## Evaluation Results
 | **Metric**              | **Value** |
 |-------------------------|--------------:|
+| **Avg.**                | **29.18**      |
+| **IFEval (0-Shot)**      | 64.8         |
+| **BBH (3-Shot)**         | 35.48         |
+| **MATH Level 5 (4-Shot)**| 15.86         |
+| **GPQA (0-Shot)**        | 10.29         |
+| **MuSR (0-Shot)**        | 13.47        |
+| **MMLU-PRO (5-Shot)**    | 35.17        |
 Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/sethuiyer__Qwen2.5-7B-Anvita-details).
 Personal Benchmarks - check [PERSONAL_BENCHMARK.md](./PERSONAL_BENCHMARK.md)