Clémentine commited on
Commit
6fa8358
1 Parent(s): ae4f37a
Files changed (1) hide show
  1. src/index.html +1 -6
src/index.html CHANGED
@@ -49,11 +49,6 @@
49
  </d-front-matter>
50
  <d-title>
51
  <h1 class="l-page" style="text-align: center;">Open-LLM performances are plateauing, let’s make it steep again </h1>
52
- <div id="title-plot" class="l-body l-screen">
53
- <figure>
54
- <img src="assets/images/banner.png" alt="Banner">
55
- </figure>
56
- </div>
57
  </d-title>
58
  <d-byline></d-byline>
59
  <d-article>
@@ -216,7 +211,7 @@
216
 
217
  <h3>What do the rankings look like?</h3>
218
 
219
- <p>Taking a look at the top 10 models on the previous version of the Open LLM Leaderboard, and comparing with this updated version, some models appear to have a relatively stable ranking (in bold below): Qwen-2-72B instruct, Meta’s Llama3-70B, both instruct and base version, 01-ai’s Yi-1.5-34B, chat version, Cohere’s Command R + model, and lastly Smaug-72B, from AbacusAI.</p>
220
  <p>We’ve been particularly impressed by Qwen2-72B-Instruct, one step above other models (notably thanks to its performance in math, long range reasoning, and knowledge)</p>
221
  <p>The current second best model, Llama-3-70B-Instruct, interestingly loses 15 points to its pretrained version counterpart on GPQA, which begs the question whether the particularly extensive instruction fine-tuning done by the Meta team on this model affected some expert/graduate level knowledge.</p>
222
  <p>Also very interesting is the fact that a new challenger climbed the ranks to reach 3rd place despite its smaller size. With only 13B parameters, Microsoft’s Phi-3-medium-4K-instruct model shows a performance equivalent to models 2 to 4 times its size. It would be very interesting to have more information on the training procedure for Phi or an independant reproduction from an external team with open training recipes/datasets.</p>
 
49
  </d-front-matter>
50
  <d-title>
51
  <h1 class="l-page" style="text-align: center;">Open-LLM performances are plateauing, let’s make it steep again </h1>
 
 
 
 
 
52
  </d-title>
53
  <d-byline></d-byline>
54
  <d-article>
 
211
 
212
  <h3>What do the rankings look like?</h3>
213
 
214
+ <p>Taking a look at the top 10 models on the previous version of the Open LLM Leaderboard, and comparing with this updated version, some models appear to have a relatively stable ranking (in bold below): Qwen-2-72B instruct, Meta’s Llama3-70B instruct, 01-ai’s Yi-1.5-34B chat, Cohere’s Command R + model, and lastly Smaug-72B, from AbacusAI.</p>
215
  <p>We’ve been particularly impressed by Qwen2-72B-Instruct, one step above other models (notably thanks to its performance in math, long range reasoning, and knowledge)</p>
216
  <p>The current second best model, Llama-3-70B-Instruct, interestingly loses 15 points to its pretrained version counterpart on GPQA, which begs the question whether the particularly extensive instruction fine-tuning done by the Meta team on this model affected some expert/graduate level knowledge.</p>
217
  <p>Also very interesting is the fact that a new challenger climbed the ranks to reach 3rd place despite its smaller size. With only 13B parameters, Microsoft’s Phi-3-medium-4K-instruct model shows a performance equivalent to models 2 to 4 times its size. It would be very interesting to have more information on the training procedure for Phi or an independant reproduction from an external team with open training recipes/datasets.</p>