TRI-ML
/

DCLM-1B-IT

Transformers

Safetensors

openlm

Inference Endpoints

Model card Files Files and versions Community

achal-tri commited on Jul 22

Commit

6f7b089

•

1 Parent(s): 94356f2

Update README.md

Browse files

Files changed (1) hide show

README.md +11 -14

README.md CHANGED Viewed

@@ -60,25 +60,22 @@ Here are the evaluation results for DCLM-1B models on various tasks (using [llm-
 | DCLM-1B   | 45.2   | 28.1       | 47.5          |
 | DCLM-1B-IT| 47.1   | 33.6       | 51.4          |
-Note: All scores are presented as decimal values between 0 and 1, representing the proportion of correct answers or the model's performance on each task.
-Moreover, we present our evaluation results on Length-Controlled Alpaca-Eval 2.0 to measure our instruction-following capabilities.
 | Model                              | AlpacaEval2.0 LC Win-rate (%) |
 |------------------------------------|------------------------------:|
-| **Our runs**                       |                               |
-| DCLM-IT-1B                         |                **8.6**        |
-| DCLM-IT-7B                         |                16.6           |
-| **Reported from the leaderboard**  |                               |
-| Gemma-Instruct-7B                  |                10.4           |
-| Nous-Hermes-13B                    |                9.7            |
-| DaVinci001                         |                9.0            |
-| LLaMA-2-Chat-13B                   |                8.4            |
-| Alpaca-7B                          |                5.9            |
 | Gemma-Instruct-2B                  |                5.4            |
 | Phi-2 SFT                          |                5.9            |
-| Qwen1.5 1.8B Chat                  |                2.6            |
-|--------------------------------------------------------------------|
 ## Example Code

 | DCLM-1B   | 45.2   | 28.1       | 47.5          |
 | DCLM-1B-IT| 47.1   | 33.6       | 51.4          |
+Moreover, we present our evaluation results on Length-Controlled Alpaca-Eval 2.0 to measure our instruction-following capabilities. We report results
+from the leaderboard for non-DCLM models. We compare to state-of-the-art small models, and also include a few larger model sizes for comparison.
 | Model                              | AlpacaEval2.0 LC Win-rate (%) |
 |------------------------------------|------------------------------:|
+| Qwen1.5 1.8B Chat                  |                2.6            |
 | Gemma-Instruct-2B                  |                5.4            |
 | Phi-2 SFT                          |                5.9            |
+| DCLM-IT-1B                         |                **8.6**        |
+| **Larger model sizes**             |                  |
+| Alpaca-7B                          |                5.9            |
+| LLaMA-2-Chat-13B                   |                8.4            |
+| DaVinci001                         |                9.0            |
+| Nous-Hermes-13B                    |                9.7            |
+| Gemma-Instruct-7B                  |                10.4           |
+| DCLM-IT-7B                         |                16.6           |
 ## Example Code