Update README.md
Browse files
README.md
CHANGED
@@ -60,25 +60,22 @@ Here are the evaluation results for DCLM-1B models on various tasks (using [llm-
|
|
60 |
| DCLM-1B | 45.2 | 28.1 | 47.5 |
|
61 |
| DCLM-1B-IT| 47.1 | 33.6 | 51.4 |
|
62 |
|
63 |
-
|
64 |
-
|
65 |
-
Moreover, we present our evaluation results on Length-Controlled Alpaca-Eval 2.0 to measure our instruction-following capabilities.
|
66 |
|
67 |
| Model | AlpacaEval2.0 LC Win-rate (%) |
|
68 |
|------------------------------------|------------------------------:|
|
69 |
-
|
|
70 |
-
| DCLM-IT-1B | **8.6** |
|
71 |
-
| DCLM-IT-7B | 16.6 |
|
72 |
-
| **Reported from the leaderboard** | |
|
73 |
-
| Gemma-Instruct-7B | 10.4 |
|
74 |
-
| Nous-Hermes-13B | 9.7 |
|
75 |
-
| DaVinci001 | 9.0 |
|
76 |
-
| LLaMA-2-Chat-13B | 8.4 |
|
77 |
-
| Alpaca-7B | 5.9 |
|
78 |
| Gemma-Instruct-2B | 5.4 |
|
79 |
| Phi-2 SFT | 5.9 |
|
80 |
-
|
|
81 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
82 |
|
83 |
## Example Code
|
84 |
|
|
|
60 |
| DCLM-1B | 45.2 | 28.1 | 47.5 |
|
61 |
| DCLM-1B-IT| 47.1 | 33.6 | 51.4 |
|
62 |
|
63 |
+
Moreover, we present our evaluation results on Length-Controlled Alpaca-Eval 2.0 to measure our instruction-following capabilities. We report results
|
64 |
+
from the leaderboard for non-DCLM models. We compare to state-of-the-art small models, and also include a few larger model sizes for comparison.
|
|
|
65 |
|
66 |
| Model | AlpacaEval2.0 LC Win-rate (%) |
|
67 |
|------------------------------------|------------------------------:|
|
68 |
+
| Qwen1.5 1.8B Chat | 2.6 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
69 |
| Gemma-Instruct-2B | 5.4 |
|
70 |
| Phi-2 SFT | 5.9 |
|
71 |
+
| DCLM-IT-1B | **8.6** |
|
72 |
+
| **Larger model sizes** | |
|
73 |
+
| Alpaca-7B | 5.9 |
|
74 |
+
| LLaMA-2-Chat-13B | 8.4 |
|
75 |
+
| DaVinci001 | 9.0 |
|
76 |
+
| Nous-Hermes-13B | 9.7 |
|
77 |
+
| Gemma-Instruct-7B | 10.4 |
|
78 |
+
| DCLM-IT-7B | 16.6 |
|
79 |
|
80 |
## Example Code
|
81 |
|