loubnabnl HF staff commited on
Commit
ce9dcde
1 Parent(s): 6aada0a

add eval score

Browse files
Files changed (1) hide show
  1. README.md +2 -0
README.md CHANGED
@@ -39,6 +39,8 @@ To build SmolLM-Instruct, we finetune the base models on publicly available data
39
  |v0.1| Initial release of SmolLM-Instruct. We finetune on the permissive subset of the WebInstructSub dataset, combined with StarCoder2-Self-OSS-Instruct. Then, we perform DPO (Direct Preference Optimization) for one epoch on HelpSteer for the 135M and 1.7B models, and argilla/dpo-mix-7k for the 360M model.|
40
  |v0.2| We changed the finetuning mix to datasets more suitable for smol models. We train on a new dataset of 2k simple everyday conversations we generated by llama3.1-70B [everyday-conversations-llama3.1-2k](https://huggingface.co/datasets/HuggingFaceTB/everyday-conversations-llama3.1-2k/), [Magpie-Pro-300K-Filtere](https://huggingface.co/datasets/Magpie-Align/Magpie-Pro-300K-Filtered), [self-oss-instruct-sc2-exec-filter-50k](https://huggingface.co/datasets/bigcode/self-oss-instruct-sc2-exec-filter-50k), and a small subset of [OpenHermes-2.5](https://huggingface.co/datasets/teknium/OpenHermes-2.5)|
41
 
 
 
42
  ## Usage
43
 
44
  ### Local Applications
 
39
  |v0.1| Initial release of SmolLM-Instruct. We finetune on the permissive subset of the WebInstructSub dataset, combined with StarCoder2-Self-OSS-Instruct. Then, we perform DPO (Direct Preference Optimization) for one epoch on HelpSteer for the 135M and 1.7B models, and argilla/dpo-mix-7k for the 360M model.|
40
  |v0.2| We changed the finetuning mix to datasets more suitable for smol models. We train on a new dataset of 2k simple everyday conversations we generated by llama3.1-70B [everyday-conversations-llama3.1-2k](https://huggingface.co/datasets/HuggingFaceTB/everyday-conversations-llama3.1-2k/), [Magpie-Pro-300K-Filtere](https://huggingface.co/datasets/Magpie-Align/Magpie-Pro-300K-Filtered), [self-oss-instruct-sc2-exec-filter-50k](https://huggingface.co/datasets/bigcode/self-oss-instruct-sc2-exec-filter-50k), and a small subset of [OpenHermes-2.5](https://huggingface.co/datasets/teknium/OpenHermes-2.5)|
41
 
42
+ We've noticed that the v0.2 models are better at staying on topic and responding appropriately to standard prompts, such as greetings and questions about their role as AI assistants. Additionally, SmolLM-360M-Instruct (v0.2) has a 63.3% win rate over SmolLM-360M-Instruct (v0.1) on AlpacaEval. You can find the details [here](https://huggingface.co/datasets/HuggingFaceTB/alpaca_eval_details/).
43
+
44
  ## Usage
45
 
46
  ### Local Applications