viethoangtranduong
commited on
Commit
•
e6e8d18
1
Parent(s):
ccbadf0
Update README.md
Browse files
README.md
CHANGED
@@ -44,8 +44,8 @@ to learn more about "Programmatically scale human preferences and alignment in G
|
|
44 |
|
45 |
|
46 |
#### Result:
|
47 |
-
- This model scored **30.2** on [Alpaca-Eval 2.0](https://tatsu-lab.github.io/alpaca_eval/) - ranked
|
48 |
-
- Utilizing the model with PairRM, which involved generating 16 responses and submitting the highest-scoring one by PairRM, we scored **34.86** - ranked
|
49 |
The best model on the leaderboard is "gpt-4-turbo".
|
50 |
|
51 |
We acknowledge that Alpaca-Eval 2.0 is not the full reflection of LLMs' performances.
|
|
|
44 |
|
45 |
|
46 |
#### Result:
|
47 |
+
- This model scored **30.2** on [Alpaca-Eval 2.0](https://tatsu-lab.github.io/alpaca_eval/) - ranked 3rd and the highest for an open source base model at the time of publication.
|
48 |
+
- Utilizing the model with PairRM, which involved generating 16 responses and submitting the highest-scoring one by PairRM, we scored **34.86** - ranked 2nd.
|
49 |
The best model on the leaderboard is "gpt-4-turbo".
|
50 |
|
51 |
We acknowledge that Alpaca-Eval 2.0 is not the full reflection of LLMs' performances.
|