DPOpenHermes-7B / README.md

Update README.md

c48c842 12 months ago

3.68 kB

	---
	base_model: teknium/OpenHermes-2.5-Mistral-7B
	license: apache-2.0
	datasets:
	- teknium/openhermes
	- argilla/ultrafeedback-binarized-preferences
	- Intel/orca_dpo_pairs
	language:
	- en
	library_name: transformers
	pipeline_tag: text-generation
	---

	# DPOpenHermes 7B

	## OpenHermes x Notus x Neural

	This is an RL fine tuned [OpenHermes-2.5-Mistral-7B](https://huggingface.co/teknium/OpenHermes-2.5-Mistral-7B) using the [Intel/orca_dpo_pairs](https://huggingface.co/datasets/Intel/orca_dpo_pairs) and [argilla/ultrafeedback-binarized-preferences](https://huggingface.co/datasets/argilla/ultrafeedback-binarized-preferences) preference datasets for reinforcement learning using Direct Preference Optimization (DPO)

	DPOpenHermes is trained using qLoRA. The adapter is also provided in this model repo.

	# Training Details

	[<img src="https://raw.githubusercontent.com/OpenAccess-AI-Collective/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/OpenAccess-AI-Collective/axolotl)

	DPOpenHermes was trained on a single H100 80GB hosted on RunPod for ~10h for 0.6 epochs of the dataset.

	https://wandb.ai/oaaic/openhermes-dpo/reports/DPOpenHermes--Vmlldzo2MTQ3NDg2

	# Benchmarks

	## AGIEval

	```
	\| Task \|Version\| Metric \|Value \| \|Stderr\|
	\|------------------------------\|------:\|--------\|-----:\|---\|-----:\|
	\|agieval_aqua_rat \| 0\|acc \|0.2480\|_ \|0.0272\|
	\| \| \|acc_norm\|0.2520\|_ \|0.0273\|
	\|agieval_logiqa_en \| 0\|acc \|0.3810\|_ \|0.0190\|
	\| \| \|acc_norm\|0.3856\|_ \|0.0191\|
	\|agieval_lsat_ar \| 0\|acc \|0.2348\|_ \|0.0280\|
	\| \| \|acc_norm\|0.2304\|_ \|0.0278\|
	\|agieval_lsat_lr \| 0\|acc \|0.5118\|_ \|0.0222\|
	\| \| \|acc_norm\|0.5196\|_ \|0.0221\|
	\|agieval_lsat_rc \| 0\|acc \|0.5948\|_ \|0.0300\|
	\| \| \|acc_norm\|0.5688\|_ \|0.0303\|
	\|agieval_sat_en \| 0\|acc \|0.7427\|_ \|0.0305\|
	\| \| \|acc_norm\|0.7427\|_ \|0.0305\|
	\|agieval_sat_en_without_passage\| 0\|acc \|0.4563\|_ \|0.0348\|
	\| \| \|acc_norm\|0.4515\|_ \|0.0348\|
	\|agieval_sat_math \| 0\|acc \|0.3818\|_ \|0.0328\|
	\| \| \|acc_norm\|0.3682\|_ \|0.0326\|
	```

	Average: 0.4399

	## GPT4All

	```
	\| Task \|Version\| Metric \|Value \| \|Stderr\|
	\|-------------\|------:\|--------\|-----:\|---\|-----:\|
	\|arc_challenge\| 0\|acc \|0.5930\|_ \|0.0144\|
	\| \| \|acc_norm\|0.6323\|_ \|0.0141\|
	\|arc_easy \| 0\|acc \|0.8443\|_ \|0.0074\|
	\| \| \|acc_norm\|0.8295\|_ \|0.0077\|
	\|boolq \| 1\|acc \|0.8599\|_ \|0.0061\|
	\|hellaswag \| 0\|acc \|0.6548\|_ \|0.0047\|
	\| \| \|acc_norm\|0.8365\|_ \|0.0037\|
	\|openbookqa \| 0\|acc \|0.3520\|_ \|0.0214\|
	\| \| \|acc_norm\|0.4640\|_ \|0.0223\|
	\|piqa \| 0\|acc \|0.8210\|_ \|0.0089\|
	\| \| \|acc_norm\|0.8335\|_ \|0.0087\|
	\|winogrande \| 0\|acc \|0.7466\|_ \|0.0122\|
	```

	Average: 0.7431

	## TruthfulQA

	```
	hf-causal-experimental (pretrained=openaccess-ai-collective/dpopenhermes-alpha-v1,dtype=bfloat16,trust_remote_code=True,use_accelerate=True), limit: None, provide_description: False, num_fewshot: 0, batch_size: 16
	\| Task \|Version\|Metric\|Value \| \|Stderr\|
	\|-------------\|------:\|------\|-----:\|---\|-----:\|
	\|truthfulqa_mc\| 1\|mc1 \|0.4186\|_ \|0.0173\|
	\| \| \|mc2 \|0.5847\|_ \|0.0153\|
	```

	---
	base_model: teknium/OpenHermes-2.5-Mistral-7B
	license: apache-2.0
	datasets:
	- teknium/openhermes
	- argilla/ultrafeedback-binarized-preferences
	- Intel/orca_dpo_pairs
	language:
	- en
	library_name: transformers
	pipeline_tag: text-generation
	---

	# DPOpenHermes 7B

	## OpenHermes x Notus x Neural

	This is an RL fine tuned [OpenHermes-2.5-Mistral-7B](https://huggingface.co/teknium/OpenHermes-2.5-Mistral-7B) using the [Intel/orca_dpo_pairs](https://huggingface.co/datasets/Intel/orca_dpo_pairs) and [argilla/ultrafeedback-binarized-preferences](https://huggingface.co/datasets/argilla/ultrafeedback-binarized-preferences) preference datasets for reinforcement learning using Direct Preference Optimization (DPO)

	DPOpenHermes is trained using qLoRA. The adapter is also provided in this model repo.

	# Training Details

	[<img src="https://raw.githubusercontent.com/OpenAccess-AI-Collective/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/OpenAccess-AI-Collective/axolotl)

	DPOpenHermes was trained on a single H100 80GB hosted on RunPod for ~10h for 0.6 epochs of the dataset.

	https://wandb.ai/oaaic/openhermes-dpo/reports/DPOpenHermes--Vmlldzo2MTQ3NDg2

	# Benchmarks

	## AGIEval

	```
	\| Task \|Version\| Metric \|Value \| \|Stderr\|
	\|------------------------------\|------:\|--------\|-----:\|---\|-----:\|
	\|agieval_aqua_rat \| 0\|acc \|0.2480\|_ \|0.0272\|
	\| \| \|acc_norm\|0.2520\|_ \|0.0273\|
	\|agieval_logiqa_en \| 0\|acc \|0.3810\|_ \|0.0190\|
	\| \| \|acc_norm\|0.3856\|_ \|0.0191\|
	\|agieval_lsat_ar \| 0\|acc \|0.2348\|_ \|0.0280\|
	\| \| \|acc_norm\|0.2304\|_ \|0.0278\|
	\|agieval_lsat_lr \| 0\|acc \|0.5118\|_ \|0.0222\|
	\| \| \|acc_norm\|0.5196\|_ \|0.0221\|
	\|agieval_lsat_rc \| 0\|acc \|0.5948\|_ \|0.0300\|
	\| \| \|acc_norm\|0.5688\|_ \|0.0303\|
	\|agieval_sat_en \| 0\|acc \|0.7427\|_ \|0.0305\|
	\| \| \|acc_norm\|0.7427\|_ \|0.0305\|
	\|agieval_sat_en_without_passage\| 0\|acc \|0.4563\|_ \|0.0348\|
	\| \| \|acc_norm\|0.4515\|_ \|0.0348\|
	\|agieval_sat_math \| 0\|acc \|0.3818\|_ \|0.0328\|
	\| \| \|acc_norm\|0.3682\|_ \|0.0326\|
	```

	Average: 0.4399

	## GPT4All

	```
	\| Task \|Version\| Metric \|Value \| \|Stderr\|
	\|-------------\|------:\|--------\|-----:\|---\|-----:\|
	\|arc_challenge\| 0\|acc \|0.5930\|_ \|0.0144\|
	\| \| \|acc_norm\|0.6323\|_ \|0.0141\|
	\|arc_easy \| 0\|acc \|0.8443\|_ \|0.0074\|
	\| \| \|acc_norm\|0.8295\|_ \|0.0077\|
	\|boolq \| 1\|acc \|0.8599\|_ \|0.0061\|
	\|hellaswag \| 0\|acc \|0.6548\|_ \|0.0047\|
	\| \| \|acc_norm\|0.8365\|_ \|0.0037\|
	\|openbookqa \| 0\|acc \|0.3520\|_ \|0.0214\|
	\| \| \|acc_norm\|0.4640\|_ \|0.0223\|
	\|piqa \| 0\|acc \|0.8210\|_ \|0.0089\|
	\| \| \|acc_norm\|0.8335\|_ \|0.0087\|
	\|winogrande \| 0\|acc \|0.7466\|_ \|0.0122\|
	```

	Average: 0.7431

	## TruthfulQA

	```
	hf-causal-experimental (pretrained=openaccess-ai-collective/dpopenhermes-alpha-v1,dtype=bfloat16,trust_remote_code=True,use_accelerate=True), limit: None, provide_description: False, num_fewshot: 0, batch_size: 16
	\| Task \|Version\|Metric\|Value \| \|Stderr\|
	\|-------------\|------:\|------\|-----:\|---\|-----:\|
	\|truthfulqa_mc\| 1\|mc1 \|0.4186\|_ \|0.0173\|
	\| \| \|mc2 \|0.5847\|_ \|0.0153\|
	```