hamishivi
/

OLMo-1B-0724-Instruct-hf

Text Generation

Inference Endpoints

Model card Files Files and versions Metrics Training metrics Community

OLMo-1B-0724-Instruct-hf / README.md

hamishivi's picture

Create README.md

152b0d7 verified 4 months ago

|

2.39 kB

	---
	license: apache-2.0
	datasets:
	- allenai/dolma
	- allenai/tulu-v2-sft-mixture-olmo-4096
	- argilla/ultrafeedback-binarized-preferences-cleaned
	language:
	- en
	---
	# OLMo-1B-0724 Instruct

	This is a version of [OLMo-1B-0724-hf](https://huggingface.co/allenai/OLMo-1B-0724-hf) that has undergone SFT and DPO training.
	See [the SFT model card for details on SFT training](https://huggingface.co/hamishivi/OLMo-1B-0724-SFT-hf).

	This model is initialised from [OLMo-1B-0724-SFT-hf](https://huggingface.co/hamishivi/OLMo-1B-0724-SFT-hf), and then DPO trained on a cleaned ultrafeedback dataset
	for 3 epochs with a batch size of 32, beta of 0.1, linear warmup for 10% of training, and then linear cooldown.

	Evals are as follows:

	\| Metric \| [OLMo-1B-0724-hf](https://huggingface.co/allenai/OLMo-1B-0724-hf) \| [OLMo-1B-0724-SFT-hf](https://huggingface.co/hamishivi/OLMo-1B-0724-SFT-hf) \| [OLMo-1B-0724-Instruct-hf](https://huggingface.co/hamishivi/OLMo-1B-0724-Instruct-hf) (this model!) \|
	\|---------------------------\|-----------------\|---------------------\|-------------------------\|
	\| MMLU 0-shot \| 25.0 \| 36.0 \| 36.7 \|
	\| GSM8k CoT 8-shot \| 7.0 \| 12.5 \| 12.5 \|
	\| BBH CoT 3-shot \| 22.5 \| 27.2 \| 30.6 \|
	\| HumanEval P@10 \| 16.0 \| 21.2 \| 22.0 \|
	\| AlpacaEval 1 \| - \| 41.5 \| 50.9 \|
	\| AlpacaEval 2 LC \| - \| 2.7 \| 2.5 \|
	\| Toxigen % Toxic \| 80.3 \| 59.7 \| 14.1 \|
	\| TruthfulQA %Info+True \| 23.0 \| 40.9 \| 42.2 \|
	\| IFEval Loose Acc \| 20.5 \| 26.1 \| 24.2 \|
	\| XSTest F1 \| 67.6 \| 81.9 \| 79.8 \|
	\| Average of above metrics \| 25.2 \| 33.0 \| 38.7 \|


	Model training and evaluation was performed using [Open-instruct](https://github.com/allenai/open-instruct), so check that out for more details on evaluation.