piccolo-2x7b / README.md
macadeliccc's picture
Update README.md
2bef08a verified
metadata
license: cc-by-4.0

Piccolo-2x7b

In loving memory of my dog Klaus (Piccolo)

~ Piccolo (Italian): the little one ~

piccolo.png

GGUF

Quants are available here

Code Example

Inference and Evaluation colab available here

from transformers import AutoModelForCausalLM, AutoTokenizer

def generate_response(prompt):
    """
    Generate a response from the model based on the input prompt.
    Args:
    prompt (str): Prompt for the model.

    Returns:
    str: The generated response from the model.
    """
    inputs = tokenizer(prompt, return_tensors="pt")
    outputs = model.generate(**inputs, max_new_tokens=256, eos_token_id=tokenizer.eos_token_id, pad_token_id=tokenizer.pad_token_id)

    response = tokenizer.decode(outputs[0], skip_special_tokens=True)

    return response

model_id = "macadeliccc/piccolo-2x7b"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id,load_in_4bit=True)

prompt = "What is the best way to train Cane Corsos?"

print("Response:")
print(generate_response(prompt), "\n")

The model is capable of quality code, math, and logical reasoning. Try whatever questions you think of.

Evaluations

Tasks Version Filter n-shot Metric Value Stderr
arc_easy Yaml none 0 acc 0.8552 ± 0.0072
none 0 acc_norm 0.8237 ± 0.0078
boolq Yaml none 0 acc 0.8749 ± 0.0058
hellaswag Yaml none 0 acc 0.6734 ± 0.0047
none 0 acc_norm 0.8489 ± 0.0036
openbookqa Yaml none 0 acc 0.3640 ± 0.0215
none 0 acc_norm 0.4780 ± 0.0224
piqa Yaml none 0 acc 0.8330 ± 0.0087
none 0 acc_norm 0.8368 ± 0.0086
winogrande Yaml none 0 acc 0.7703 ± 0.0118

Model Evaluation Summary

Model AGIEval GPT4All TruthfulQA Bigbench Average
piccolo-math-2x7b 43.89% 74.98% 63.96% 44.99% 56.96%

AGIEval

Tasks and Results

Task Version Metric Value Stderr
agieval_aqua_rat 0 acc 24.41 ± 2.70
acc_norm 24.80 ± 2.72
agieval_logiqa_en 0 acc 35.79 ± 1.88
acc_norm 36.71 ± 1.89
agieval_lsat_ar 0 acc 23.48 ± 2.80
acc_norm 23.91 ± 2.82
agieval_lsat_lr 0 acc 49.22 ± 2.22
acc_norm 50.00 ± 2.22
agieval_lsat_rc 0 acc 63.94 ± 2.93
acc_norm 64.31 ± 2.93
agieval_sat_en 0 acc 77.18 ± 2.93
acc_norm 76.70 ± 2.95
agieval_sat_en_without_passage 0 acc 45.15 ± 3.48
acc_norm 44.66 ± 3.47
agieval_sat_math 0 acc 33.64 ± 3.19
acc_norm 30.00 ± 3.10

Average: 43.89%

GPT4All

Tasks and Results

Task Version Metric Value Stderr
arc_challenge 0 acc 61.86 ± 1.42
acc_norm 62.88 ± 1.41
arc_easy 0 acc 84.34 ± 0.75
acc_norm 80.47 ± 0.81
boolq 1 acc 86.88 ± 0.59
hellaswag 0 acc 68.56 ± 0.46
acc_norm 85.16 ± 0.35
openbookqa 0 acc 37.00 ± 2.16
acc_norm 47.80 ± 2.24
piqa 0 acc 82.21 ± 0.89
acc_norm 83.68 ± 0.86
winogrande 0 acc 77.98 ± 1.16

Average: 74.98%

TruthfulQA

Tasks and Results

Task Version Metric Value Stderr
truthfulqa_mc 1 mc1 47.37 ± 1.75
mc2 63.96 ± 1.57

Average: 63.96%

Bigbench

Tasks and Results

Task Version Metric Value Stderr
bigbench_causal_judgement 0 multiple_choice_grade 55.26 ± 3.62
bigbench_date_understanding 0 multiple_choice_grade 63.14 ± 2.51
bigbench_disambiguation_qa 0 multiple_choice_grade 42.64 ± 3.08
bigbench_geometric_shapes 0 multiple_choice_grade 22.84 ± 2.22
exact_str_match 3.34 ± 0.95
bigbench_logical_deduction_five_objects 0 multiple_choice_grade 36.60 ± 2.16
bigbench_logical_deduction_seven_objects 0 multiple_choice_grade 25.57 ± 1.65
bigbench_logical_deduction_three_objects 0 multiple_choice_grade 56.00 ± 2.87
bigbench_movie_recommendation 0 multiple_choice_grade 42.40 ± 2.21
bigbench_navigate 0 multiple_choice_grade 54.70 ± 1.57
bigbench_reasoning_about_colored_objects 0 multiple_choice_grade 62.90 ± 1.08
bigbench_ruin_names 0 multiple_choice_grade 53.35 ± 2.36
bigbench_salient_translation_error_detection 0 multiple_choice_grade 24.35 ± 1.36
bigbench_snarks 0 multiple_choice_grade 62.43 ± 3.61
bigbench_sports_understanding 0 multiple_choice_grade 70.28 ± 1.46
bigbench_temporal_sequences 0 multiple_choice_grade 41.30 ± 1.56
bigbench_tracking_shuffled_objects_five_objects 0 multiple_choice_grade 22.32 ± 1.18
bigbench_tracking_shuffled_objects_seven_objects 0 multiple_choice_grade 17.77 ± 0.91
bigbench_tracking_shuffled_objects_three_objects 0 multiple_choice_grade 56.00 ± 2.87

Overall Average Score

Average score: 56.96%