HuggingFaceTB
/

SmolLM-135M-Instruct

Text Generation

alignment-handbook

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

loubnabnl HF staff commited on Jul 16

Commit

399e30b

•

1 Parent(s): 52379bc

Update README.md

Files changed (1) hide show

README.md +3 -1

README.md CHANGED Viewed

@@ -6,7 +6,7 @@ language:
 ---
-# SmolLM
 <center>
     <img src="https://huggingface.co/datasets/HuggingFaceTB/images/resolve/main/banner_smol.png" alt="SmolLM" width="1100" height="600">
@@ -27,6 +27,8 @@ SmolLM is a series of state-of-the-art small language models available in three
 To build SmolLM-Instruct, we instruction tuned the models using publicly available permissive instruction datasets. We trained all three models for one epoch on the permissive subset of the WebInstructSub dataset, combined with StarCoder2-Self-OSS-Instruct. Following this, we performed DPO (Direct Preference Optimization) for one epoch: using HelpSteer for the 135M and 1.7B models, and argilla/dpo-mix-7k for the 360M model. We followed the training parameters from the Zephyr-Gemma recipe in the alignment handbook, but adjusted the SFT (Supervised Fine-Tuning) learning rate to 3e-4.
 [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0)
 ### Generation
 ```bash
 pip install transformers

 ---
+# SmolLM-Instruct
 <center>
     <img src="https://huggingface.co/datasets/HuggingFaceTB/images/resolve/main/banner_smol.png" alt="SmolLM" width="1100" height="600">
 To build SmolLM-Instruct, we instruction tuned the models using publicly available permissive instruction datasets. We trained all three models for one epoch on the permissive subset of the WebInstructSub dataset, combined with StarCoder2-Self-OSS-Instruct. Following this, we performed DPO (Direct Preference Optimization) for one epoch: using HelpSteer for the 135M and 1.7B models, and argilla/dpo-mix-7k for the 360M model. We followed the training parameters from the Zephyr-Gemma recipe in the alignment handbook, but adjusted the SFT (Supervised Fine-Tuning) learning rate to 3e-4.
 [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0)
+This is the SmolLM-135M-Instruct.
 ### Generation
 ```bash
 pip install transformers