Update README.md

0fc374c verified 10 months ago

6.8 kB

	---
	library_name: transformers
	datasets:
	- argilla/distilabel-capybara-dpo-7k-binarized
	---
	# CapyLake-7B-v2-laser

	This model is a finetune of [cognitivecomputations/WestLake-7B-v2-Laser](https://huggingface.co/cognitivecomputations/WestLake-7B-v2-laser) on [argilla/distilabel-capybara-dpo-7k-binarized](https://huggingface.co/datasets/argilla/distilabel-capybara-dpo-7k-binarized)

	<div align="center">

	![image/webp](https://cdn-uploads.huggingface.co/production/uploads/6455cc8d679315e4ef16fbec/kx2uwS_kZ-rTAJiusSrAW.webp)

	[<img src="https://raw.githubusercontent.com/argilla-io/distilabel/main/docs/assets/distilabel-badge-dark.png" alt="Built with Distilabel" width="200" height="32"/>](https://github.com/argilla-io/distilabel)

	</div>

	## Process

	+ Realigned the chat template to ChatML
	+ Completed 1 Epoch
	+ 5e-05 learning rate
	+ Training time was about 2 hours on 1 H100
	+ Cost was ~$8

	## Code Example

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer

	model_id = "macadeliccc/CapyLake-7B-v2-laser"
	tokenizer = AutoTokenizer.from_pretrained(model_id)
	model = AutoModelForCausalLM.from_pretrained(model_id)

	text = "Create an idea for a TV show and write a short pilot script"
	inputs = tokenizer(text, return_tensors="pt")

	# Adding hyperparameters to the generation call
	outputs = model.generate(
	**inputs,
	max_new_tokens=4096, # Controls the maximum length of the new tokens created
	temperature=0.7, # Adjust for creativity (lower is less random)
	top_k=50, # Keeps the top k tokens for sampling
	top_p=0.95, # Uses nucleus sampling with this cumulative probability
	num_return_sequences=1, # Number of sequences to generate
	no_repeat_ngram_size=2, # Prevents repeating n-grams to ensure diversity
	early_stopping=True # Stops generation when all sequences reach the EOS token
	)

	print(tokenizer.decode(outputs[0], skip_special_tokens=True))
	```

	## Other Capy Models

	SOLAR-10.7B-Capy-v1.0 is also on the way. There could be more depending on performance!

	## Evaluations

	\| Model \|AGIEval\|GPT4All\|TruthfulQA\|Bigbench\|Average\|
	\|-------------------------------------------------------------------------------\|------:\|------:\|---------:\|-------:\|------:\|
	\|[CapyLake-7B-v2-laser](https://huggingface.co/macadeliccc/CapyLake-7B-v2-laser)\| 44.34\| 77.77\| 68.47\| 47.92\| 59.62\|

	### AGIEval
	\| Task \|Version\| Metric \|Value\| \|Stderr\|
	\|------------------------------\|------:\|--------\|----:\|---\|-----:\|
	\|agieval_aqua_rat \| 0\|acc \|28.35\|± \| 2.83\|
	\| \| \|acc_norm\|25.98\|± \| 2.76\|
	\|agieval_logiqa_en \| 0\|acc \|38.86\|± \| 1.91\|
	\| \| \|acc_norm\|39.02\|± \| 1.91\|
	\|agieval_lsat_ar \| 0\|acc \|25.22\|± \| 2.87\|
	\| \| \|acc_norm\|24.35\|± \| 2.84\|
	\|agieval_lsat_lr \| 0\|acc \|50.39\|± \| 2.22\|
	\| \| \|acc_norm\|51.57\|± \| 2.22\|
	\|agieval_lsat_rc \| 0\|acc \|65.06\|± \| 2.91\|
	\| \| \|acc_norm\|63.94\|± \| 2.93\|
	\|agieval_sat_en \| 0\|acc \|78.64\|± \| 2.86\|
	\| \| \|acc_norm\|78.64\|± \| 2.86\|
	\|agieval_sat_en_without_passage\| 0\|acc \|40.78\|± \| 3.43\|
	\| \| \|acc_norm\|40.78\|± \| 3.43\|
	\|agieval_sat_math \| 0\|acc \|33.64\|± \| 3.19\|
	\| \| \|acc_norm\|30.45\|± \| 3.11\|

	Average: 44.34%

	### GPT4All
	\| Task \|Version\| Metric \|Value\| \|Stderr\|
	\|-------------\|------:\|--------\|----:\|---\|-----:\|
	\|arc_challenge\| 0\|acc \|66.89\|± \| 1.38\|
	\| \| \|acc_norm\|67.49\|± \| 1.37\|
	\|arc_easy \| 0\|acc \|86.70\|± \| 0.70\|
	\| \| \|acc_norm\|81.90\|± \| 0.79\|
	\|boolq \| 1\|acc \|88.10\|± \| 0.57\|
	\|hellaswag \| 0\|acc \|71.45\|± \| 0.45\|
	\| \| \|acc_norm\|87.78\|± \| 0.33\|
	\|openbookqa \| 0\|acc \|39.80\|± \| 2.19\|
	\| \| \|acc_norm\|49.80\|± \| 2.24\|
	\|piqa \| 0\|acc \|82.86\|± \| 0.88\|
	\| \| \|acc_norm\|84.87\|± \| 0.84\|
	\|winogrande \| 0\|acc \|84.45\|± \| 1.02\|

	Average: 77.77%

	### TruthfulQA
	\| Task \|Version\|Metric\|Value\| \|Stderr\|
	\|-------------\|------:\|------\|----:\|---\|-----:\|
	\|truthfulqa_mc\| 1\|mc1 \|53.98\|± \| 1.74\|
	\| \| \|mc2 \|68.47\|± \| 1.53\|

	Average: 68.47%

	### Bigbench

	\| Task \|Version\| Metric \|Value\| \|Stderr\|
	\|------------------------------------------------\|------:\|---------------------\|----:\|---\|-----:\|
	\|bigbench_causal_judgement \| 0\|multiple_choice_grade\|59.47\|± \| 3.57\|
	\|bigbench_date_understanding \| 0\|multiple_choice_grade\|64.50\|± \| 2.49\|
	\|bigbench_disambiguation_qa \| 0\|multiple_choice_grade\|44.96\|± \| 3.10\|
	\|bigbench_geometric_shapes \| 0\|multiple_choice_grade\|22.84\|± \| 2.22\|
	\| \| \|exact_str_match \| 2.79\|± \| 0.87\|
	\|bigbench_logical_deduction_five_objects \| 0\|multiple_choice_grade\|30.80\|± \| 2.07\|
	\|bigbench_logical_deduction_seven_objects \| 0\|multiple_choice_grade\|21.57\|± \| 1.56\|
	\|bigbench_logical_deduction_three_objects \| 0\|multiple_choice_grade\|56.67\|± \| 2.87\|
	\|bigbench_movie_recommendation \| 0\|multiple_choice_grade\|51.60\|± \| 2.24\|
	\|bigbench_navigate \| 0\|multiple_choice_grade\|51.00\|± \| 1.58\|
	\|bigbench_reasoning_about_colored_objects \| 0\|multiple_choice_grade\|70.35\|± \| 1.02\|
	\|bigbench_ruin_names \| 0\|multiple_choice_grade\|51.79\|± \| 2.36\|
	\|bigbench_salient_translation_error_detection \| 0\|multiple_choice_grade\|35.97\|± \| 1.52\|
	\|bigbench_snarks \| 0\|multiple_choice_grade\|79.01\|± \| 3.04\|
	\|bigbench_sports_understanding \| 0\|multiple_choice_grade\|75.66\|± \| 1.37\|
	\|bigbench_temporal_sequences \| 0\|multiple_choice_grade\|47.90\|± \| 1.58\|
	\|bigbench_tracking_shuffled_objects_five_objects \| 0\|multiple_choice_grade\|23.84\|± \| 1.21\|
	\|bigbench_tracking_shuffled_objects_seven_objects\| 0\|multiple_choice_grade\|18.00\|± \| 0.92\|
	\|bigbench_tracking_shuffled_objects_three_objects\| 0\|multiple_choice_grade\|56.67\|± \| 2.87\|

	Average: 47.92%

	Average score: 59.62%

	Elapsed time: 01:57:56

	---
	library_name: transformers
	datasets:
	- argilla/distilabel-capybara-dpo-7k-binarized
	---
	# CapyLake-7B-v2-laser

	This model is a finetune of [cognitivecomputations/WestLake-7B-v2-Laser](https://huggingface.co/cognitivecomputations/WestLake-7B-v2-laser) on [argilla/distilabel-capybara-dpo-7k-binarized](https://huggingface.co/datasets/argilla/distilabel-capybara-dpo-7k-binarized)

	<div align="center">

	![image/webp](https://cdn-uploads.huggingface.co/production/uploads/6455cc8d679315e4ef16fbec/kx2uwS_kZ-rTAJiusSrAW.webp)

	[<img src="https://raw.githubusercontent.com/argilla-io/distilabel/main/docs/assets/distilabel-badge-dark.png" alt="Built with Distilabel" width="200" height="32"/>](https://github.com/argilla-io/distilabel)

	</div>

	## Process

	+ Realigned the chat template to ChatML
	+ Completed 1 Epoch
	+ 5e-05 learning rate
	+ Training time was about 2 hours on 1 H100
	+ Cost was ~$8

	## Code Example

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer

	model_id = "macadeliccc/CapyLake-7B-v2-laser"
	tokenizer = AutoTokenizer.from_pretrained(model_id)
	model = AutoModelForCausalLM.from_pretrained(model_id)

	text = "Create an idea for a TV show and write a short pilot script"
	inputs = tokenizer(text, return_tensors="pt")

	# Adding hyperparameters to the generation call
	outputs = model.generate(
	**inputs,
	max_new_tokens=4096, # Controls the maximum length of the new tokens created
	temperature=0.7, # Adjust for creativity (lower is less random)
	top_k=50, # Keeps the top k tokens for sampling
	top_p=0.95, # Uses nucleus sampling with this cumulative probability
	num_return_sequences=1, # Number of sequences to generate
	no_repeat_ngram_size=2, # Prevents repeating n-grams to ensure diversity
	early_stopping=True # Stops generation when all sequences reach the EOS token
	)

	print(tokenizer.decode(outputs[0], skip_special_tokens=True))
	```

	## Other Capy Models

	SOLAR-10.7B-Capy-v1.0 is also on the way. There could be more depending on performance!

	## Evaluations

	\| Model \|AGIEval\|GPT4All\|TruthfulQA\|Bigbench\|Average\|
	\|-------------------------------------------------------------------------------\|------:\|------:\|---------:\|-------:\|------:\|
	\|[CapyLake-7B-v2-laser](https://huggingface.co/macadeliccc/CapyLake-7B-v2-laser)\| 44.34\| 77.77\| 68.47\| 47.92\| 59.62\|

	### AGIEval
	\| Task \|Version\| Metric \|Value\| \|Stderr\|
	\|------------------------------\|------:\|--------\|----:\|---\|-----:\|
	\|agieval_aqua_rat \| 0\|acc \|28.35\|± \| 2.83\|
	\| \| \|acc_norm\|25.98\|± \| 2.76\|
	\|agieval_logiqa_en \| 0\|acc \|38.86\|± \| 1.91\|
	\| \| \|acc_norm\|39.02\|± \| 1.91\|
	\|agieval_lsat_ar \| 0\|acc \|25.22\|± \| 2.87\|
	\| \| \|acc_norm\|24.35\|± \| 2.84\|
	\|agieval_lsat_lr \| 0\|acc \|50.39\|± \| 2.22\|
	\| \| \|acc_norm\|51.57\|± \| 2.22\|
	\|agieval_lsat_rc \| 0\|acc \|65.06\|± \| 2.91\|
	\| \| \|acc_norm\|63.94\|± \| 2.93\|
	\|agieval_sat_en \| 0\|acc \|78.64\|± \| 2.86\|
	\| \| \|acc_norm\|78.64\|± \| 2.86\|
	\|agieval_sat_en_without_passage\| 0\|acc \|40.78\|± \| 3.43\|
	\| \| \|acc_norm\|40.78\|± \| 3.43\|
	\|agieval_sat_math \| 0\|acc \|33.64\|± \| 3.19\|
	\| \| \|acc_norm\|30.45\|± \| 3.11\|

	Average: 44.34%

	### GPT4All
	\| Task \|Version\| Metric \|Value\| \|Stderr\|
	\|-------------\|------:\|--------\|----:\|---\|-----:\|
	\|arc_challenge\| 0\|acc \|66.89\|± \| 1.38\|
	\| \| \|acc_norm\|67.49\|± \| 1.37\|
	\|arc_easy \| 0\|acc \|86.70\|± \| 0.70\|
	\| \| \|acc_norm\|81.90\|± \| 0.79\|
	\|boolq \| 1\|acc \|88.10\|± \| 0.57\|
	\|hellaswag \| 0\|acc \|71.45\|± \| 0.45\|
	\| \| \|acc_norm\|87.78\|± \| 0.33\|
	\|openbookqa \| 0\|acc \|39.80\|± \| 2.19\|
	\| \| \|acc_norm\|49.80\|± \| 2.24\|
	\|piqa \| 0\|acc \|82.86\|± \| 0.88\|
	\| \| \|acc_norm\|84.87\|± \| 0.84\|
	\|winogrande \| 0\|acc \|84.45\|± \| 1.02\|

	Average: 77.77%

	### TruthfulQA
	\| Task \|Version\|Metric\|Value\| \|Stderr\|
	\|-------------\|------:\|------\|----:\|---\|-----:\|
	\|truthfulqa_mc\| 1\|mc1 \|53.98\|± \| 1.74\|
	\| \| \|mc2 \|68.47\|± \| 1.53\|

	Average: 68.47%

	### Bigbench

	\| Task \|Version\| Metric \|Value\| \|Stderr\|
	\|------------------------------------------------\|------:\|---------------------\|----:\|---\|-----:\|
	\|bigbench_causal_judgement \| 0\|multiple_choice_grade\|59.47\|± \| 3.57\|
	\|bigbench_date_understanding \| 0\|multiple_choice_grade\|64.50\|± \| 2.49\|
	\|bigbench_disambiguation_qa \| 0\|multiple_choice_grade\|44.96\|± \| 3.10\|
	\|bigbench_geometric_shapes \| 0\|multiple_choice_grade\|22.84\|± \| 2.22\|
	\| \| \|exact_str_match \| 2.79\|± \| 0.87\|
	\|bigbench_logical_deduction_five_objects \| 0\|multiple_choice_grade\|30.80\|± \| 2.07\|
	\|bigbench_logical_deduction_seven_objects \| 0\|multiple_choice_grade\|21.57\|± \| 1.56\|
	\|bigbench_logical_deduction_three_objects \| 0\|multiple_choice_grade\|56.67\|± \| 2.87\|
	\|bigbench_movie_recommendation \| 0\|multiple_choice_grade\|51.60\|± \| 2.24\|
	\|bigbench_navigate \| 0\|multiple_choice_grade\|51.00\|± \| 1.58\|
	\|bigbench_reasoning_about_colored_objects \| 0\|multiple_choice_grade\|70.35\|± \| 1.02\|
	\|bigbench_ruin_names \| 0\|multiple_choice_grade\|51.79\|± \| 2.36\|
	\|bigbench_salient_translation_error_detection \| 0\|multiple_choice_grade\|35.97\|± \| 1.52\|
	\|bigbench_snarks \| 0\|multiple_choice_grade\|79.01\|± \| 3.04\|
	\|bigbench_sports_understanding \| 0\|multiple_choice_grade\|75.66\|± \| 1.37\|
	\|bigbench_temporal_sequences \| 0\|multiple_choice_grade\|47.90\|± \| 1.58\|
	\|bigbench_tracking_shuffled_objects_five_objects \| 0\|multiple_choice_grade\|23.84\|± \| 1.21\|
	\|bigbench_tracking_shuffled_objects_seven_objects\| 0\|multiple_choice_grade\|18.00\|± \| 0.92\|
	\|bigbench_tracking_shuffled_objects_three_objects\| 0\|multiple_choice_grade\|56.67\|± \| 2.87\|

	Average: 47.92%

	Average score: 59.62%

	Elapsed time: 01:57:56