metadata

license: cc-by-4.0

Piccolo-2x7b

In loving memory of my dog Klaus (Piccolo)

~ Piccolo (Italian): the little one ~

GGUF

Quants are available here

Code Example

Inference and Evaluation colab available here

from transformers import AutoModelForCausalLM, AutoTokenizer

def generate_response(prompt):
    """
    Generate a response from the model based on the input prompt.
    Args:
    prompt (str): Prompt for the model.

    Returns:
    str: The generated response from the model.
    """
    inputs = tokenizer(prompt, return_tensors="pt")
    outputs = model.generate(**inputs, max_new_tokens=256, eos_token_id=tokenizer.eos_token_id, pad_token_id=tokenizer.pad_token_id)

    response = tokenizer.decode(outputs[0], skip_special_tokens=True)

    return response

model_id = "macadeliccc/piccolo-2x7b"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id,load_in_4bit=True)

prompt = "What is the best way to train Cane Corsos?"

print("Response:")
print(generate_response(prompt), "\n")

The model is capable of quality code, math, and logical reasoning. Try whatever questions you think of.

Evaluations

Tasks	Version	Filter	Metric	Value		Stderr
arc_easy	Yaml	none	acc	0.8552	±	0.0072
		none	acc_norm	0.8237	±	0.0078
boolq	Yaml	none	acc	0.8749	±	0.0058
hellaswag	Yaml	none	acc	0.6734	±	0.0047
		none	acc_norm	0.8489	±	0.0036
openbookqa	Yaml	none	acc	0.3640	±	0.0215
		none	acc_norm	0.4780	±	0.0224
piqa	Yaml	none	acc	0.8330	±	0.0087
		none	acc_norm	0.8368	±	0.0086
winogrande	Yaml	none	acc	0.7703	±	0.0118

Model Evaluation Summary

Model	AGIEval	GPT4All	TruthfulQA	Bigbench	Average
piccolo-math-2x7b	43.89%	74.98%	63.96%	44.99%	56.96%

AGIEval

Tasks and Results

Task	Version	Metric	Value	Stderr
agieval_aqua_rat	0	acc	24.41	± 2.70
		acc_norm	24.80	± 2.72
agieval_logiqa_en	0	acc	35.79	± 1.88
		acc_norm	36.71	± 1.89
agieval_lsat_ar	0	acc	23.48	± 2.80
		acc_norm	23.91	± 2.82
agieval_lsat_lr	0	acc	49.22	± 2.22
		acc_norm	50.00	± 2.22
agieval_lsat_rc	0	acc	63.94	± 2.93
		acc_norm	64.31	± 2.93
agieval_sat_en	0	acc	77.18	± 2.93
		acc_norm	76.70	± 2.95
agieval_sat_en_without_passage	0	acc	45.15	± 3.48
		acc_norm	44.66	± 3.47
agieval_sat_math	0	acc	33.64	± 3.19
		acc_norm	30.00	± 3.10

Average: 43.89%

GPT4All

Tasks and Results

Task	Version	Metric	Value	Stderr
arc_challenge	0	acc	61.86	± 1.42
		acc_norm	62.88	± 1.41
arc_easy	0	acc	84.34	± 0.75
		acc_norm	80.47	± 0.81
boolq	1	acc	86.88	± 0.59
hellaswag	0	acc	68.56	± 0.46
		acc_norm	85.16	± 0.35
openbookqa	0	acc	37.00	± 2.16
		acc_norm	47.80	± 2.24
piqa	0	acc	82.21	± 0.89
		acc_norm	83.68	± 0.86
winogrande	0	acc	77.98	± 1.16

Average: 74.98%

TruthfulQA

Tasks and Results

Task	Version	Metric	Value	Stderr
truthfulqa_mc	1	mc1	47.37	± 1.75
		mc2	63.96	± 1.57

Average: 63.96%

Bigbench

Tasks and Results

Task	Version	Metric	Value	Stderr
bigbench_causal_judgement	0	multiple_choice_grade	55.26	± 3.62
bigbench_date_understanding	0	multiple_choice_grade	63.14	± 2.51
bigbench_disambiguation_qa	0	multiple_choice_grade	42.64	± 3.08
bigbench_geometric_shapes	0	multiple_choice_grade	22.84	± 2.22
		exact_str_match	3.34	± 0.95
bigbench_logical_deduction_five_objects	0	multiple_choice_grade	36.60	± 2.16
bigbench_logical_deduction_seven_objects	0	multiple_choice_grade	25.57	± 1.65
bigbench_logical_deduction_three_objects	0	multiple_choice_grade	56.00	± 2.87
bigbench_movie_recommendation	0	multiple_choice_grade	42.40	± 2.21
bigbench_navigate	0	multiple_choice_grade	54.70	± 1.57
bigbench_reasoning_about_colored_objects	0	multiple_choice_grade	62.90	± 1.08
bigbench_ruin_names	0	multiple_choice_grade	53.35	± 2.36
bigbench_salient_translation_error_detection	0	multiple_choice_grade	24.35	± 1.36
bigbench_snarks	0	multiple_choice_grade	62.43	± 3.61
bigbench_sports_understanding	0	multiple_choice_grade	70.28	± 1.46
bigbench_temporal_sequences	0	multiple_choice_grade	41.30	± 1.56
bigbench_tracking_shuffled_objects_five_objects	0	multiple_choice_grade	22.32	± 1.18
bigbench_tracking_shuffled_objects_seven_objects	0	multiple_choice_grade	17.77	± 0.91
bigbench_tracking_shuffled_objects_three_objects	0	multiple_choice_grade	56.00	± 2.87

Overall Average Score

Average score: 56.96%