macadeliccc's picture
Update README.md
0fc374c verified
metadata
library_name: transformers
datasets:
  - argilla/distilabel-capybara-dpo-7k-binarized

CapyLake-7B-v2-laser

This model is a finetune of cognitivecomputations/WestLake-7B-v2-Laser on argilla/distilabel-capybara-dpo-7k-binarized

image/webp

Built with Distilabel

Process

  • Realigned the chat template to ChatML
  • Completed 1 Epoch
  • 5e-05 learning rate
  • Training time was about 2 hours on 1 H100
  • Cost was ~$8

Code Example

from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "macadeliccc/CapyLake-7B-v2-laser"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)

text = "Create an idea for a TV show and write a short pilot script"
inputs = tokenizer(text, return_tensors="pt")

# Adding hyperparameters to the generation call
outputs = model.generate(
    **inputs,
    max_new_tokens=4096,  # Controls the maximum length of the new tokens created
    temperature=0.7,  # Adjust for creativity (lower is less random)
    top_k=50,  # Keeps the top k tokens for sampling
    top_p=0.95,  # Uses nucleus sampling with this cumulative probability
    num_return_sequences=1,  # Number of sequences to generate
    no_repeat_ngram_size=2,  # Prevents repeating n-grams to ensure diversity
    early_stopping=True  # Stops generation when all sequences reach the EOS token
)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Other Capy Models

SOLAR-10.7B-Capy-v1.0 is also on the way. There could be more depending on performance!

Evaluations

Model AGIEval GPT4All TruthfulQA Bigbench Average
CapyLake-7B-v2-laser 44.34 77.77 68.47 47.92 59.62

AGIEval

Task Version Metric Value Stderr
agieval_aqua_rat 0 acc 28.35 ± 2.83
acc_norm 25.98 ± 2.76
agieval_logiqa_en 0 acc 38.86 ± 1.91
acc_norm 39.02 ± 1.91
agieval_lsat_ar 0 acc 25.22 ± 2.87
acc_norm 24.35 ± 2.84
agieval_lsat_lr 0 acc 50.39 ± 2.22
acc_norm 51.57 ± 2.22
agieval_lsat_rc 0 acc 65.06 ± 2.91
acc_norm 63.94 ± 2.93
agieval_sat_en 0 acc 78.64 ± 2.86
acc_norm 78.64 ± 2.86
agieval_sat_en_without_passage 0 acc 40.78 ± 3.43
acc_norm 40.78 ± 3.43
agieval_sat_math 0 acc 33.64 ± 3.19
acc_norm 30.45 ± 3.11

Average: 44.34%

GPT4All

Task Version Metric Value Stderr
arc_challenge 0 acc 66.89 ± 1.38
acc_norm 67.49 ± 1.37
arc_easy 0 acc 86.70 ± 0.70
acc_norm 81.90 ± 0.79
boolq 1 acc 88.10 ± 0.57
hellaswag 0 acc 71.45 ± 0.45
acc_norm 87.78 ± 0.33
openbookqa 0 acc 39.80 ± 2.19
acc_norm 49.80 ± 2.24
piqa 0 acc 82.86 ± 0.88
acc_norm 84.87 ± 0.84
winogrande 0 acc 84.45 ± 1.02

Average: 77.77%

TruthfulQA

Task Version Metric Value Stderr
truthfulqa_mc 1 mc1 53.98 ± 1.74
mc2 68.47 ± 1.53

Average: 68.47%

Bigbench

Task Version Metric Value Stderr
bigbench_causal_judgement 0 multiple_choice_grade 59.47 ± 3.57
bigbench_date_understanding 0 multiple_choice_grade 64.50 ± 2.49
bigbench_disambiguation_qa 0 multiple_choice_grade 44.96 ± 3.10
bigbench_geometric_shapes 0 multiple_choice_grade 22.84 ± 2.22
exact_str_match 2.79 ± 0.87
bigbench_logical_deduction_five_objects 0 multiple_choice_grade 30.80 ± 2.07
bigbench_logical_deduction_seven_objects 0 multiple_choice_grade 21.57 ± 1.56
bigbench_logical_deduction_three_objects 0 multiple_choice_grade 56.67 ± 2.87
bigbench_movie_recommendation 0 multiple_choice_grade 51.60 ± 2.24
bigbench_navigate 0 multiple_choice_grade 51.00 ± 1.58
bigbench_reasoning_about_colored_objects 0 multiple_choice_grade 70.35 ± 1.02
bigbench_ruin_names 0 multiple_choice_grade 51.79 ± 2.36
bigbench_salient_translation_error_detection 0 multiple_choice_grade 35.97 ± 1.52
bigbench_snarks 0 multiple_choice_grade 79.01 ± 3.04
bigbench_sports_understanding 0 multiple_choice_grade 75.66 ± 1.37
bigbench_temporal_sequences 0 multiple_choice_grade 47.90 ± 1.58
bigbench_tracking_shuffled_objects_five_objects 0 multiple_choice_grade 23.84 ± 1.21
bigbench_tracking_shuffled_objects_seven_objects 0 multiple_choice_grade 18.00 ± 0.92
bigbench_tracking_shuffled_objects_three_objects 0 multiple_choice_grade 56.67 ± 2.87

Average: 47.92%

Average score: 59.62%

Elapsed time: 01:57:56