Spaces:
Running
Running
Nathan Habib
commited on
Commit
•
bfab6ae
1
Parent(s):
c19cedb
add plots
Browse files- dist/index.html +5 -3
dist/index.html
CHANGED
@@ -274,7 +274,7 @@
|
|
274 |
<div class="main-plot-container">
|
275 |
<figure><img src="assets/images/ranking_top10_bottom10.png"/></figure>
|
276 |
<div id="ranking">
|
277 |
-
<iframe src="rankings_change.html" title="description", height="
|
278 |
</div>
|
279 |
</div>
|
280 |
|
@@ -283,9 +283,11 @@
|
|
283 |
|
284 |
<p>For example, our different evaluations results are not all correlated with one another, which is expected.</p>
|
285 |
|
286 |
-
<div class="
|
287 |
<figure><img src="assets/images/v2_correlation_heatmap.png"/></figure>
|
288 |
-
<div id="heatmap"
|
|
|
|
|
289 |
</div>
|
290 |
|
291 |
<p>MMLU-Pro, BBH and ARC-challenge are well correlated together. It is known that these 3 are well correlated with human preference (as they tend to align with human judgment on LMSys’s chatbot arena).</p>
|
|
|
274 |
<div class="main-plot-container">
|
275 |
<figure><img src="assets/images/ranking_top10_bottom10.png"/></figure>
|
276 |
<div id="ranking">
|
277 |
+
<iframe src="rankings_change.html" title="description", height="800" width="100%", style="border:none;"></iframe>
|
278 |
</div>
|
279 |
</div>
|
280 |
|
|
|
283 |
|
284 |
<p>For example, our different evaluations results are not all correlated with one another, which is expected.</p>
|
285 |
|
286 |
+
<div class="main-plot-container">
|
287 |
<figure><img src="assets/images/v2_correlation_heatmap.png"/></figure>
|
288 |
+
<div id="heatmap">
|
289 |
+
<iframe src="correlation_heatmap.html" title="description", height="800" width="100%", style="border:none;"></iframe>
|
290 |
+
</div>
|
291 |
</div>
|
292 |
|
293 |
<p>MMLU-Pro, BBH and ARC-challenge are well correlated together. It is known that these 3 are well correlated with human preference (as they tend to align with human judgment on LMSys’s chatbot arena).</p>
|