Update README.md
Browse files
README.md
CHANGED
@@ -39,7 +39,7 @@ To build SmolLM-Instruct, we finetune the base models on publicly available data
|
|
39 |
|v0.1| Initial release of SmolLM-Instruct. We finetune on the permissive subset of the [WebInstructSub](https://huggingface.co/datasets/TIGER-Lab/WebInstructSub) dataset, combined with [StarCoder2-Self-OSS-Instruct](https://huggingface.co/datasets/bigcode/self-oss-instruct-sc2-exec-filter-50k). Then, we perform DPO (Direct Preference Optimization) for one epoch on [HelpSteer](https://huggingface.co/datasets/nvidia/HelpSteer) for the 135M and 1.7B models, and [argilla/dpo-mix-7k](https://huggingface.co/datasets/argilla/dpo-mix-7k) for the 360M model.|
|
40 |
|v0.2| We changed the finetuning mix to datasets more suitable for smol models. We train on a new dataset of 2k simple everyday conversations we generated by llama3.1-70B [everyday-conversations-llama3.1-2k](https://huggingface.co/datasets/HuggingFaceTB/everyday-conversations-llama3.1-2k/), [Magpie-Pro-300K-Filtered](https://huggingface.co/datasets/Magpie-Align/Magpie-Pro-300K-Filtered), [StarCoder2-Self-OSS-Instruct](https://huggingface.co/datasets/bigcode/self-oss-instruct-sc2-exec-filter-50k), and a small subset of [OpenHermes-2.5](https://huggingface.co/datasets/teknium/OpenHermes-2.5)|
|
41 |
|
42 |
-
v0.2 models are better at staying on topic and responding appropriately to standard prompts, such as greetings and questions about their role as AI assistants.
|
43 |
|
44 |
## Usage
|
45 |
|
|
|
39 |
|v0.1| Initial release of SmolLM-Instruct. We finetune on the permissive subset of the [WebInstructSub](https://huggingface.co/datasets/TIGER-Lab/WebInstructSub) dataset, combined with [StarCoder2-Self-OSS-Instruct](https://huggingface.co/datasets/bigcode/self-oss-instruct-sc2-exec-filter-50k). Then, we perform DPO (Direct Preference Optimization) for one epoch on [HelpSteer](https://huggingface.co/datasets/nvidia/HelpSteer) for the 135M and 1.7B models, and [argilla/dpo-mix-7k](https://huggingface.co/datasets/argilla/dpo-mix-7k) for the 360M model.|
|
40 |
|v0.2| We changed the finetuning mix to datasets more suitable for smol models. We train on a new dataset of 2k simple everyday conversations we generated by llama3.1-70B [everyday-conversations-llama3.1-2k](https://huggingface.co/datasets/HuggingFaceTB/everyday-conversations-llama3.1-2k/), [Magpie-Pro-300K-Filtered](https://huggingface.co/datasets/Magpie-Align/Magpie-Pro-300K-Filtered), [StarCoder2-Self-OSS-Instruct](https://huggingface.co/datasets/bigcode/self-oss-instruct-sc2-exec-filter-50k), and a small subset of [OpenHermes-2.5](https://huggingface.co/datasets/teknium/OpenHermes-2.5)|
|
41 |
|
42 |
+
v0.2 models are better at staying on topic and responding appropriately to standard prompts, such as greetings and questions about their role as AI assistants. SmolLM-360M-Instruct (v0.2) has a 63.3% win rate over SmolLM-360M-Instruct (v0.1) on AlpacaEval. You can find the details [here](https://huggingface.co/datasets/HuggingFaceTB/alpaca_eval_details/).
|
43 |
|
44 |
## Usage
|
45 |
|