viethoangtranduong commited on
Commit
e6e8d18
1 Parent(s): ccbadf0

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -44,8 +44,8 @@ to learn more about "Programmatically scale human preferences and alignment in G
44
 
45
 
46
  #### Result:
47
- - This model scored **30.2** on [Alpaca-Eval 2.0](https://tatsu-lab.github.io/alpaca_eval/) - ranked #4 and the highest for an open source base model at the time of publication.
48
- - Utilizing the model with PairRM, which involved generating 16 responses and submitting the highest-scoring one by PairRM, we scored **34.86** - ranked #2.
49
  The best model on the leaderboard is "gpt-4-turbo".
50
 
51
  We acknowledge that Alpaca-Eval 2.0 is not the full reflection of LLMs' performances.
 
44
 
45
 
46
  #### Result:
47
+ - This model scored **30.2** on [Alpaca-Eval 2.0](https://tatsu-lab.github.io/alpaca_eval/) - ranked 3rd and the highest for an open source base model at the time of publication.
48
+ - Utilizing the model with PairRM, which involved generating 16 responses and submitting the highest-scoring one by PairRM, we scored **34.86** - ranked 2nd.
49
  The best model on the leaderboard is "gpt-4-turbo".
50
 
51
  We acknowledge that Alpaca-Eval 2.0 is not the full reflection of LLMs' performances.