Model Card for Alpaca Dragon 72B V1

Fine tune of Smaug 72b v0.1 using an alpaca data set I have handy. The data is of planning and reasoning, which I use to help allow a model to break down a set of asks into a logical plan. For some odd reason it bumps the mmlu and winogrande? I would have expected the ARC to go up over those two, but this is often more of an artform than a science at times. All thanks to Abacus.AI for sharing their work.

I used the same dataset in training one of my owl series Strix Rufipes 70B, which has worked well for planning out development tasks and other technical work.

LICENSE

Note the license points back to SMAUG base license as it is a fine tune of their model only. Respect and abide by their conditions. Again, many thanks to Abacus for making their work open and use that as inspiration to keep your work open and respect their license agreements. License Link

How to Get Started with the Model

Use the code below to get started with the model.

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("ibivibiv/alpaca-dragon-72b-v1")
model = AutoModelForCausalLM.from_pretrained("ibivibiv/alpaca-dragon-72b-v1")

inputs = tokenizer("### Instruction: Create a plan for developing the game of snake in python using pygame.\n### Response:\n", return_tensors="pt", return_attention_mask=False)

outputs = model.generate(**inputs, max_length=200)
text = tokenizer.batch_decode(outputs)[0]
print(text)

Evaluation

Test Name	Accuracy (%)
All	77.31
arc:challenge	70.82
hellaswag	69.84
hendrycksTest-abstract_algebra	42.00
hendrycksTest-anatomy	71.85
hendrycksTest-astronomy	86.84
hendrycksTest-business_ethics	82.00
hendrycksTest-clinical_knowledge	84.53
hendrycksTest-college_biology	93.06
hendrycksTest-college_chemistry	54.00
hendrycksTest-college_computer_science	65.00
hendrycksTest-college_mathematics	52.00
hendrycksTest-college_medicine	75.14
hendrycksTest-college_physics	55.88
hendrycksTest-computer_security	82.00
hendrycksTest-conceptual_physics	80.43
hendrycksTest-econometrics	60.53
hendrycksTest-electrical_engineering	79.31
hendrycksTest-elementary_mathematics	70.37
hendrycksTest-formal_logic	58.73
hendrycksTest-global_facts	54.00
hendrycksTest-high_school_biology	88.39
hendrycksTest-high_school_chemistry	66.01
hendrycksTest-high_school_computer_science	82.00
hendrycksTest-high_school_european_history	84.24
hendrycksTest-high_school_geography	94.44
hendrycksTest-high_school_government_and_politics	98.96
hendrycksTest-high_school_macroeconomics	82.05
hendrycksTest-high_school_mathematics	45.93
hendrycksTest-high_school_microeconomics	86.13
hendrycksTest-high_school_physics	54.97
hendrycksTest-high_school_psychology	92.84
hendrycksTest-high_school_statistics	68.98
hendrycksTest-high_school_us_history	91.67
hendrycksTest-high_school_world_history	89.87
hendrycksTest-human_aging	78.03
hendrycksTest-human_sexuality	89.31
hendrycksTest-international_law	90.91
hendrycksTest-jurisprudence	87.96
hendrycksTest-logical_fallacies	84.05
hendrycksTest-machine_learning	58.93
hendrycksTest-management	87.38
hendrycksTest-marketing	95.30
hendrycksTest-medical_genetics	86.00
hendrycksTest-miscellaneous	92.21
hendrycksTest-moral_disputes	83.53
hendrycksTest-moral_scenarios	69.72
hendrycksTest-nutrition	85.62
hendrycksTest-philosophy	83.60
hendrycksTest-prehistory	87.04
hendrycksTest-professional_accounting	65.96
hendrycksTest-professional_law	60.69
hendrycksTest-professional_medicine	82.72
hendrycksTest-professional_psychology	81.86
hendrycksTest-public_relations	75.45
hendrycksTest-security_studies	82.04
hendrycksTest-sociology	88.56
hendrycksTest-us_foreign_policy	94.00
hendrycksTest-virology	57.23
hendrycksTest-world_religions	89.47
truthfulqa:mc	72.6
winogrande	86.03
gsm8k	77.63

Environmental Impact

Hardware Type: [A100's..... more than I wanted to use since its all on my $$$]
Hours used: [8]
Cloud Provider: [runpod.io]
Compute Region: [US]
Carbon Emitted: [?]

Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric	Value
Avg.	79.30
AI2 Reasoning Challenge (25-Shot)	73.89
HellaSwag (10-Shot)	88.16
MMLU (5-Shot)	77.40
TruthfulQA (0-shot)	72.69
Winogrande (5-shot)	86.03
GSM8k (5-shot)	77.63

ibivibiv
/

alpaca-dragon-72b-v1

Model Card for Alpaca Dragon 72B V1

LICENSE

How to Get Started with the Model

Evaluation

Environmental Impact

Open LLM Leaderboard Evaluation Results

Model tree for ibivibiv/alpaca-dragon-72b-v1

Collection including ibivibiv/alpaca-dragon-72b-v1

Experimental Models

Evaluation results