Crystalcareai
commited on
Commit
•
ddced7f
1
Parent(s):
1e41e74
Update README.md
Browse files
README.md
CHANGED
@@ -75,6 +75,11 @@ Despite its compact size, Arcee Spark offers deep reasoning capabilities, making
|
|
75 |
<div style="display: flex; justify-content: center; margin: 20px 0;">
|
76 |
<img src="https://i.ibb.co/BLX8GmZ/Screenshot-2024-06-23-at-10-43-50-PM.png" alt="Additional Benchmark Results" style="border-radius: 10px; max-width: 90%; height: auto; box-shadow: 0 4px 8px 0 rgba(0, 0, 0, 0.2), 0 6px 20px 0 rgba(0, 0, 0, 0.19);">
|
77 |
</div>
|
|
|
|
|
|
|
|
|
|
|
78 |
### MT-Bench
|
79 |
|
80 |
```markdown
|
@@ -144,6 +149,32 @@ AGI-eval average: 51.11
|
|
144 |
|
145 |
Gpt4al Average: 69.37
|
146 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
147 |
## License
|
148 |
|
149 |
Arcee Spark is released under the Apache 2.0 license.
|
|
|
75 |
<div style="display: flex; justify-content: center; margin: 20px 0;">
|
76 |
<img src="https://i.ibb.co/BLX8GmZ/Screenshot-2024-06-23-at-10-43-50-PM.png" alt="Additional Benchmark Results" style="border-radius: 10px; max-width: 90%; height: auto; box-shadow: 0 4px 8px 0 rgba(0, 0, 0, 0.2), 0 6px 20px 0 rgba(0, 0, 0, 0.19);">
|
77 |
</div>
|
78 |
+
|
79 |
+
<div style="display: flex; justify-content: center; margin: 20px 0;">
|
80 |
+
<img src="https://i.postimg.cc/Vs7v0Vbn/Screenshot-2024-06-24-at-1-10-58-AM.png" alt="Bigbenchhard Results" style="border-radius: 10px; max-width: 90%; height: auto; box-shadow: 0 4px 8px 0 rgba(0, 0, 0, 0.2), 0 6px 20px 0 rgba(0, 0, 0, 0.19);">
|
81 |
+
</div>
|
82 |
+
|
83 |
### MT-Bench
|
84 |
|
85 |
```markdown
|
|
|
149 |
|
150 |
Gpt4al Average: 69.37
|
151 |
|
152 |
+
## Big Bench Hard
|
153 |
+
|
154 |
+
| Task |Version| Metric |Value | |Stderr|
|
155 |
+
|------------------------------------------------|------:|---------------------|-----:|---|-----:|
|
156 |
+
|bigbench_causal_judgement | 0|multiple_choice_grade|0.6053|± |0.0356|
|
157 |
+
|bigbench_date_understanding | 0|multiple_choice_grade|0.6450|± |0.0249|
|
158 |
+
|bigbench_disambiguation_qa | 0|multiple_choice_grade|0.5233|± |0.0312|
|
159 |
+
|bigbench_geometric_shapes | 0|multiple_choice_grade|0.2006|± |0.0212|
|
160 |
+
| | |exact_str_match |0.0000|± |0.0000|
|
161 |
+
|bigbench_logical_deduction_five_objects | 0|multiple_choice_grade|0.2840|± |0.0202|
|
162 |
+
|bigbench_logical_deduction_seven_objects | 0|multiple_choice_grade|0.2429|± |0.0162|
|
163 |
+
|bigbench_logical_deduction_three_objects | 0|multiple_choice_grade|0.4367|± |0.0287|
|
164 |
+
|bigbench_movie_recommendation | 0|multiple_choice_grade|0.4720|± |0.0223|
|
165 |
+
|bigbench_navigate | 0|multiple_choice_grade|0.4980|± |0.0158|
|
166 |
+
|bigbench_reasoning_about_colored_objects | 0|multiple_choice_grade|0.5600|± |0.0111|
|
167 |
+
|bigbench_ruin_names | 0|multiple_choice_grade|0.4375|± |0.0235|
|
168 |
+
|bigbench_salient_translation_error_detection | 0|multiple_choice_grade|0.2685|± |0.0140|
|
169 |
+
|bigbench_snarks | 0|multiple_choice_grade|0.7348|± |0.0329|
|
170 |
+
|bigbench_sports_understanding | 0|multiple_choice_grade|0.6978|± |0.0146|
|
171 |
+
|bigbench_temporal_sequences | 0|multiple_choice_grade|0.4060|± |0.0155|
|
172 |
+
|bigbench_tracking_shuffled_objects_five_objects | 0|multiple_choice_grade|0.2072|± |0.0115|
|
173 |
+
|bigbench_tracking_shuffled_objects_seven_objects| 0|multiple_choice_grade|0.1406|± |0.0083|
|
174 |
+
|bigbench_tracking_shuffled_objects_three_objects| 0|multiple_choice_grade|0.4367|± |0.0287|
|
175 |
+
|
176 |
+
Big Bench average: 45.78
|
177 |
+
|
178 |
## License
|
179 |
|
180 |
Arcee Spark is released under the Apache 2.0 license.
|