Adding Evaluation Results (#2)

f6bec51 verified 9 months ago

6.85 kB

	---
	license: unknown
	tags:
	- merge
	model-index:
	- name: Everyone-LLM-7b-Base
	results:
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: AI2 Reasoning Challenge (25-Shot)
	type: ai2_arc
	config: ARC-Challenge
	split: test
	args:
	num_few_shot: 25
	metrics:
	- type: acc_norm
	value: 66.38
	name: normalized accuracy
	source:
	url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=rombodawg/Everyone-LLM-7b-Base
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: HellaSwag (10-Shot)
	type: hellaswag
	split: validation
	args:
	num_few_shot: 10
	metrics:
	- type: acc_norm
	value: 86.02
	name: normalized accuracy
	source:
	url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=rombodawg/Everyone-LLM-7b-Base
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: MMLU (5-Shot)
	type: cais/mmlu
	config: all
	split: test
	args:
	num_few_shot: 5
	metrics:
	- type: acc
	value: 64.94
	name: accuracy
	source:
	url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=rombodawg/Everyone-LLM-7b-Base
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: TruthfulQA (0-shot)
	type: truthful_qa
	config: multiple_choice
	split: validation
	args:
	num_few_shot: 0
	metrics:
	- type: mc2
	value: 57.89
	source:
	url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=rombodawg/Everyone-LLM-7b-Base
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: Winogrande (5-shot)
	type: winogrande
	config: winogrande_xl
	split: validation
	args:
	num_few_shot: 5
	metrics:
	- type: acc
	value: 80.43
	name: accuracy
	source:
	url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=rombodawg/Everyone-LLM-7b-Base
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: GSM8k (5-shot)
	type: gsm8k
	config: main
	split: test
	args:
	num_few_shot: 5
	metrics:
	- type: acc
	value: 65.58
	name: accuracy
	source:
	url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=rombodawg/Everyone-LLM-7b-Base
	name: Open LLM Leaderboard
	---
	Everyone-LLM-7b-Base


	![image/jpeg](https://cdn-uploads.huggingface.co/production/uploads/642cc1c253e76b4c2286c58e/ECrHQnZnv8UM9GUCQtlWW.jpeg)

	EveryoneLLM series of models made by the community, for the community.

	This is the first version of Everyone-LLM, a model that combines the power of the large majority of powerfull fine-tuned LLM's made by the community, to create a vast and knowledgable LLM with various abilities.


	Prompt template: Alpaca
	```
	Below is an instruction that describes a task. Write a response that appropriately completes the request.
	### Instruction:
	{prompt}
	### Response:
	```

	The models that were used in this merger were as follow:

	- https://huggingface.co/cognitivecomputations/dolphin-2.6-mistral-7b-dpo

	- https://huggingface.co/jondurbin/bagel-dpo-7b-v0.4

	- https://huggingface.co/Locutusque/Hercules-2.0-Mistral-7B


	- https://huggingface.co/Open-Orca/Mistral-7B-OpenOrca

	- https://huggingface.co/teknium/OpenHermes-2.5-Mistral-7B

	- https://huggingface.co/NousResearch/Nous-Capybara-7B-V1.9

	- https://huggingface.co/Intel/neural-chat-7b-v3-3

	- https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2

	- https://huggingface.co/senseable/WestLake-7B-v2

	- https://huggingface.co/defog/sqlcoder-7b

	- https://huggingface.co/meta-math/MetaMath-Mistral-7B

	- https://huggingface.co/nextai-team/apollo-v1-7b

	- https://huggingface.co/WizardLM/WizardMath-7B-V1.1

	- https://huggingface.co/openchat/openchat-3.5-0106

	- https://huggingface.co/mistralai/Mistral-7B-v0.1

	Thank you to the creators of the above ai models, they have full credit for the EveryoneLLM series of models. Without their hard work we wouldnt be able to achieve the great success we have in the open source community. 💗

	You can find the write up for merging models here:

	https://docs.google.com/document/d/1_vOftBnrk9NRk5h10UqrfJ5CDih9KBKL61yvrZtVWPE/edit?usp=sharing


	# Open LLM Leaderboard Scores
	```
	\| Model \| Average \| ARC \| HellaSwag \| MMLU \| TruthfulQA \| Winogrande \| GSM8K \|
	\|------------------------------------\|---------\|---------\|-----------\|---------\|------------\|------------\|---------\|
	\| rombodawg/Everyone-LLM-7b-Base \| 70.21 \| 66.38 \| 86.02 \| 64.94 \| 57.89 \| 80.43 \| 65.58 \|
	```

	Config for the merger can be found bellow:

	```yaml
	models:
	- model: cognitivecomputations_dolphin-2.6-mistral-7b-dpo
	parameters:
	weight: 1
	- model: jondurbin_bagel-dpo-7b-v0.4
	parameters:
	weight: 1
	- model: Locutusque_Hercules-2.0-Mistral-7B
	parameters:
	weight: 1
	- model: Open-Orca_Mistral-7B-OpenOrca
	parameters:
	weight: 1
	- model: teknium_OpenHermes-2.5-Mistral-7B
	parameters:
	weight: 1
	- model: NousResearch_Nous-Capybara-7B-V1.9

	parameters:
	weight: 1
	- model: Intel_neural-chat-7b-v3-3
	parameters:
	weight: 1
	- model: mistralai_Mistral-7B-Instruct-v0.2
	parameters:
	weight: 1
	- model: senseable_WestLake-7B-v2
	parameters:
	weight: 1
	- model: defog_sqlcoder-7b
	parameters:
	weight: 1
	- model: meta-math_MetaMath-Mistral-7B
	parameters:
	weight: 1
	- model: nextai-team_apollo-v1-7b
	parameters:
	weight: 1
	- model: WizardLM_WizardMath-7B-V1.1
	parameters:
	weight: 1
	- model: openchat_openchat-3.5-0106
	parameters:
	weight: 1
	merge_method: task_arithmetic
	base_model: mistralai_Mistral-7B-v0.1
	parameters:
	normalize: true
	int8_mask: true
	dtype: float16

	```

	# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
	Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_rombodawg__Everyone-LLM-7b-Base)

	\| Metric \|Value\|
	\|---------------------------------\|----:\|
	\|Avg. \|70.21\|
	\|AI2 Reasoning Challenge (25-Shot)\|66.38\|
	\|HellaSwag (10-Shot) \|86.02\|
	\|MMLU (5-Shot) \|64.94\|
	\|TruthfulQA (0-shot) \|57.89\|
	\|Winogrande (5-shot) \|80.43\|
	\|GSM8k (5-shot) \|65.58\|