--- base_model: upstage/SOLAR-10.7B-Instruct-v1.0 tags: - alignment-handbook - generated_from_trainer - UNA - single-turn model-index: - name: UNA-SOLAR-10.7B-Instruct-v1.0 results: [] license: cc-by-nc-nd-4.0 language: - en library_name: transformers --- # UNA: Uniform Neural Alignment SFT Further: - Linear - 2e-5 Merges: - Fan in: `0:2` - Fan out: `-4:` - Intermediary layers: `1/1/1/0/1/1/0/1/0/1/1/0/1/1/0` use the On/Off as a way of regularise. ## Quants * [ggml-model-q5_k_m.gguf](https://huggingface.co/fblgit/UNA-SOLAR-10.7B-Instruct-v1.0/resolve/main/ggml-model-q5_k_m.gguf?download=true) * [ggml-model-q6_k.gguf](https://huggingface.co/fblgit/UNA-SOLAR-10.7B-Instruct-v1.0/resolve/main/ggml-model-q6_k.gguf?download=true) ## Libraries: - Transformers 4.35.0-UNA - Pytorch 2.1.0 - Datasets 2.14.6 - Tokenizers 0.14.1 ## Evals LM-Evaluation Harness `mt-bench`: ``` Mode: single Input file: data/mt_bench/model_judgment/gpt-4_single.jsonl ########## First turn ########## score model turn gpt-4 1 8.95625 claude-v1 1 8.15000 gpt-3.5-turbo 1 8.07500 LUNA-SOLARkrautLM-Instruct 1 7.93750 UNA-SOLAR-10.7B-Instruct-v1.0 1 7.80625 vicuna-33b-v1.3 1 7.45625 wizardlm-30b 1 7.13125 tulu-30b 1 7.01875 vicuna-13b-v1.3 1 6.81250 guanaco-65b 1 6.78125 nous-hermes-13b 1 6.43125 alpaca-13b 1 4.97500 rwkv-4-raven-14b 1 4.74375 llama-13b 1 3.26250 ########## Second turn ########## score model turn gpt-4 2 9.025000 gpt-3.5-turbo 2 7.812500 claude-v1 2 7.650000 UNA-SOLAR-10.7B-Instruct-v1.0 2 7.237500 LUNA-SOLARkrautLM-Instruct 2 6.987500 wizardlm-30b 2 6.887500 vicuna-33b-v1.3 2 6.787500 guanaco-65b 2 6.037500 vicuna-13b-v1.3 2 5.962500 tulu-30b 2 5.850000 nous-hermes-13b 2 4.664557 alpaca-13b 2 4.087500 rwkv-4-raven-14b 2 3.225000 llama-13b 2 1.950000 ########## Average ########## score model gpt-4 8.990625 gpt-3.5-turbo 7.943750 claude-instant-v1 7.905660 claude-v1 7.900000 UNA-SOLAR-10.7B-Instruct-v1.0 7.521875 LUNA-SOLARkrautLM-Instruct 7.462500 vicuna-33b-v1.3 7.121875 wizardlm-30b 7.009375 Llama-2-70b-chat 6.856250 Llama-2-13b-chat 6.650000 guanaco-33b 6.528125 tulu-30b 6.434375 guanaco-65b 6.409375 oasst-sft-7-llama-30b 6.409375 palm-2-chat-bison-001 6.400000 mpt-30b-chat 6.393750 vicuna-13b-v1.3 6.387500 wizardlm-13b 6.353125 Llama-2-7b-chat 6.268750 vicuna-7b-v1.3 5.996875 baize-v2-13b 5.750000 nous-hermes-13b 5.553459 mpt-7b-chat 5.459119 gpt4all-13b-snoozy 5.452830 koala-13b 5.350000 mpt-30b-instruct 5.218750 falcon-40b-instruct 5.168750 h2ogpt-oasst-open-llama-13b 4.625000 alpaca-13b 4.531250 chatglm-6b 4.500000 oasst-sft-4-pythia-12b 4.318750 rwkv-4-raven-14b 3.984375 dolly-v2-12b 3.275000 fastchat-t5-3b 3.040625 stablelm-tuned-alpha-7b 2.753125 llama-13b 2.606250 ``` `big-refactor` branch: ``` hf (pretrained=fblgit/UNA-SOLAR-10.7B-Instruct-v1.0), gen_kwargs: (None), limit: None, num_fewshot: 25, batch_size: auto (32) | Tasks |Version|Filter|n-shot| Metric |Value | |Stderr| |-------------|-------|------|-----:|--------|-----:|---|-----:| |arc_challenge|Yaml |none | 25|acc |0.6954|± |0.0134| | | |none | 25|acc_norm|0.7167|± |0.0132| hf (pretrained=fblgit/UNA-SOLAR-10.7B-Instruct-v1.0), gen_kwargs: (None), limit: None, num_fewshot: 5, batch_size: auto |Tasks|Version| Filter |n-shot| Metric |Value| |Stderr| |-----|-------|----------|-----:|-----------|----:|---|-----:| |gsm8k|Yaml |get-answer| 5|exact_match|0.671|± |0.0129| hf (pretrained=fblgit/UNA-SOLAR-10.7B-Instruct-v1.0), gen_kwargs: (), limit: None, num_fewshot: 0, batch_size: auto (64) | Tasks |Version|Filter|n-shot|Metric|Value | |Stderr| |--------------|-------|------|-----:|------|-----:|---|-----:| |truthfulqa_mc2|Yaml |none | 0|acc |0.7297|_ |0.0149| hf (pretrained=fblgit/UNA-SOLAR-10.7B-Instruct-v1.0), gen_kwargs: (None), limit: None, num_fewshot: 10, batch_size: auto (32) | Tasks |Version|Filter|n-shot| Metric |Value | |Stderr| |---------|-------|------|-----:|--------|-----:|---|-----:| |hellaswag|Yaml |none | 10|acc |0.7091|± |0.0045| | | |none | 10|acc_norm|0.8821|± |0.0032| hf (pretrained=fblgit/UNA-SOLAR-10.7B-Instruct-v1.0,dtype=float16), gen_kwargs: (), limit: None, num_fewshot: 0, batch_size: auto (32) | Tasks |Version|Filter|n-shot| Metric |Value | |Stderr| |--------------|-------|------|-----:|----------|-----:|---|-----:| |boolq |Yaml |none | 0|acc |0.8807|_ |0.0057| |lambada_openai|Yaml |none | 0|perplexity|3.2452|_ |0.0778| | | |none | 0|acc |0.7207|_ |0.0063| |piqa |Yaml |none | 0|acc |0.8020|_ |0.0093| | | |none | 0|acc_norm |0.8009|_ |0.0093| |sciq |Yaml |none | 0|acc |0.9730|_ |0.0051| | | |none | 0|acc_norm |0.9630|_ |0.0060| |winogrande |Yaml |none | 0|acc |0.7577|_ |0.0120| hf (pretrained=fblgit/UNA-SOLAR-10.7B-Instruct-v1.0,dtype=float16), gen_kwargs: (), limit: None, num_fewshot: 0, batch_size: auto (64) | Tasks |Version|Filter|n-shot| Metric |Value | |Stderr| |--------|-------|------|-----:|--------|-----:|---|-----:| |mathqa |Yaml |none | 0|acc |0.3474|_ |0.0087| | | |none | 0|acc_norm|0.3568|_ |0.0088| |pubmedqa|Yaml |none | 0|acc |0.5400|_ |0.0223| hf (pretrained=fblgit/UNA-SOLAR-10.7B-Instruct-v1.0,dtype=float16), gen_kwargs: (), limit: None, num_fewshot: 0, batch_size: auto | Tasks |Version|Filter|n-shot| Metric |Value | |Stderr| |------------------------------------------------------|-------|------|-----:|-----------|-----:|---|-----:| |bbh_fewshot |N/A |none | 0|exact_match|0.4660|_ |0.1771| | - bbh_fewshot_boolean_expressions |Yaml |none | 0|exact_match|0.8160|_ |0.0246| | - bbh_fewshot_causal_judgement |Yaml |none | 0|exact_match|0.4973|_ |0.0367| | - bbh_fewshot_date_understanding |Yaml |none | 0|exact_match|0.4840|_ |0.0317| | - bbh_fewshot_disambiguation_qa |Yaml |none | 0|exact_match|0.6520|_ |0.0302| | - bbh_fewshot_dyck_languages |Yaml |none | 0|exact_match|0.2040|_ |0.0255| | - bbh_fewshot_formal_fallacies |Yaml |none | 0|exact_match|0.5280|_ |0.0316| | - bbh_fewshot_geometric_shapes |Yaml |none | 0|exact_match|0.3360|_ |0.0299| | - bbh_fewshot_hyperbaton |Yaml |none | 0|exact_match|0.5520|_ |0.0315| | - bbh_fewshot_logical_deduction_five_objects |Yaml |none | 0|exact_match|0.4520|_ |0.0315| | - bbh_fewshot_logical_deduction_seven_objects |Yaml |none | 0|exact_match|0.3920|_ |0.0309| | - bbh_fewshot_logical_deduction_three_objects |Yaml |none | 0|exact_match|0.6200|_ |0.0308| | - bbh_fewshot_movie_recommendation |Yaml |none | 0|exact_match|0.6640|_ |0.0299| | - bbh_fewshot_multistep_arithmetic_two |Yaml |none | 0|exact_match|0.0080|_ |0.0056| | - bbh_fewshot_navigate |Yaml |none | 0|exact_match|0.6280|_ |0.0306| | - bbh_fewshot_object_counting |Yaml |none | 0|exact_match|0.3960|_ |0.0310| | - bbh_fewshot_penguins_in_a_table |Yaml |none | 0|exact_match|0.4726|_ |0.0415| | - bbh_fewshot_reasoning_about_colored_objects |Yaml |none | 0|exact_match|0.5320|_ |0.0316| | - bbh_fewshot_ruin_names |Yaml |none | 0|exact_match|0.5680|_ |0.0314| | - bbh_fewshot_salient_translation_error_detection |Yaml |none | 0|exact_match|0.5480|_ |0.0315| | - bbh_fewshot_snarks |Yaml |none | 0|exact_match|0.5169|_ |0.0376| | - bbh_fewshot_sports_understanding |Yaml |none | 0|exact_match|0.8320|_ |0.0237| | - bbh_fewshot_temporal_sequences |Yaml |none | 0|exact_match|0.5520|_ |0.0315| | - bbh_fewshot_tracking_shuffled_objects_five_objects |Yaml |none | 0|exact_match|0.1480|_ |0.0225| | - bbh_fewshot_tracking_shuffled_objects_seven_objects|Yaml |none | 0|exact_match|0.1720|_ |0.0239| | - bbh_fewshot_tracking_shuffled_objects_three_objects|Yaml |none | 0|exact_match|0.2760|_ |0.0283| | - bbh_fewshot_web_of_lies |Yaml |none | 0|exact_match|0.4760|_ |0.0316| | - bbh_fewshot_word_sorting |Yaml |none | 0|exact_match|0.2840|_ |0.0286| | Groups |Version|Filter|n-shot| Metric |Value| |Stderr| |-----------|-------|------|-----:|-----------|----:|---|-----:| |bbh_fewshot|N/A |none | 0|exact_match|0.466|_ |0.1771| hf (pretrained=fblgit/UNA-SOLAR-10.7B-Instruct-v1.0), gen_kwargs: (None), limit: None, num_fewshot: 5, batch_size: auto (16) | Tasks |Version|Filter|n-shot|Metric|Value | |Stderr| |---------------------------------------|-------|------|-----:|------|-----:|---|-----:| |mmlu |N/A |none | 0|acc |0.6513|± |0.1221| | - humanities |N/A |none | 5|acc |0.6077|± |0.1185| | - formal_logic |Yaml |none | 5|acc |0.4444|± |0.0444| | - high_school_european_history |Yaml |none | 5|acc |0.8121|± |0.0305| | - high_school_us_history |Yaml |none | 5|acc |0.8431|± |0.0255| | - high_school_world_history |Yaml |none | 5|acc |0.8523|± |0.0231| | - international_law |Yaml |none | 5|acc |0.7851|± |0.0375| | - jurisprudence |Yaml |none | 5|acc |0.7870|± |0.0396| | - logical_fallacies |Yaml |none | 5|acc |0.7546|± |0.0338| | - moral_disputes |Yaml |none | 5|acc |0.7370|± |0.0237| | - moral_scenarios |Yaml |none | 5|acc |0.4101|± |0.0164| | - philosophy |Yaml |none | 5|acc |0.7170|± |0.0256| | - prehistory |Yaml |none | 5|acc |0.7840|± |0.0229| | - professional_law |Yaml |none | 5|acc |0.4941|± |0.0128| | - world_religions |Yaml |none | 5|acc |0.7895|± |0.0313| | - other |N/A |none | 5|acc |0.7116|± |0.0939| | - business_ethics |Yaml |none | 5|acc |0.7600|± |0.0429| | - clinical_knowledge |Yaml |none | 5|acc |0.6792|± |0.0287| | - college_medicine |Yaml |none | 5|acc |0.6590|± |0.0361| | - global_facts |Yaml |none | 5|acc |0.3400|± |0.0476| | - human_aging |Yaml |none | 5|acc |0.6816|± |0.0313| | - management |Yaml |none | 5|acc |0.8350|± |0.0368| | - marketing |Yaml |none | 5|acc |0.8547|± |0.0231| | - medical_genetics |Yaml |none | 5|acc |0.7000|± |0.0461| | - miscellaneous |Yaml |none | 5|acc |0.8020|± |0.0142| | - nutrition |Yaml |none | 5|acc |0.7418|± |0.0251| | - professional_accounting |Yaml |none | 5|acc |0.5071|± |0.0298| | - professional_medicine |Yaml |none | 5|acc |0.7500|± |0.0263| | - virology |Yaml |none | 5|acc |0.5843|± |0.0384| | - social_sciences |N/A |none | 5|acc |0.7537|± |0.0681| | - econometrics |Yaml |none | 5|acc |0.5000|± |0.0470| | - high_school_geography |Yaml |none | 5|acc |0.8586|± |0.0248| | - high_school_government_and_politics|Yaml |none | 5|acc |0.9016|± |0.0215| | - high_school_macroeconomics |Yaml |none | 5|acc |0.6615|± |0.0240| | - high_school_microeconomics |Yaml |none | 5|acc |0.7311|± |0.0288| | - high_school_psychology |Yaml |none | 5|acc |0.8404|± |0.0157| | - human_sexuality |Yaml |none | 5|acc |0.7328|± |0.0388| | - professional_psychology |Yaml |none | 5|acc |0.6814|± |0.0189| | - public_relations |Yaml |none | 5|acc |0.6909|± |0.0443| | - security_studies |Yaml |none | 5|acc |0.7469|± |0.0278| | - sociology |Yaml |none | 5|acc |0.8308|± |0.0265| | - us_foreign_policy |Yaml |none | 5|acc |0.8900|± |0.0314| | - stem |N/A |none | 5|acc |0.5569|± |0.1380| | - abstract_algebra |Yaml |none | 5|acc |0.4100|± |0.0494| | - anatomy |Yaml |none | 5|acc |0.6222|± |0.0419| | - astronomy |Yaml |none | 5|acc |0.7368|± |0.0358| | - college_biology |Yaml |none | 5|acc |0.8056|± |0.0331| | - college_chemistry |Yaml |none | 5|acc |0.4700|± |0.0502| | - college_computer_science |Yaml |none | 5|acc |0.5100|± |0.0502| | - college_mathematics |Yaml |none | 5|acc |0.2800|± |0.0451| | - college_physics |Yaml |none | 5|acc |0.3431|± |0.0472| | - computer_security |Yaml |none | 5|acc |0.7400|± |0.0441| | - conceptual_physics |Yaml |none | 5|acc |0.6340|± |0.0315| | - electrical_engineering |Yaml |none | 5|acc |0.6000|± |0.0408| | - elementary_mathematics |Yaml |none | 5|acc |0.4815|± |0.0257| | - high_school_biology |Yaml |none | 5|acc |0.8032|± |0.0226| | - high_school_chemistry |Yaml |none | 5|acc |0.4877|± |0.0352| | - high_school_computer_science |Yaml |none | 5|acc |0.7200|± |0.0451| | - high_school_mathematics |Yaml |none | 5|acc |0.3815|± |0.0296| | - high_school_physics |Yaml |none | 5|acc |0.3576|± |0.0391| | - high_school_statistics |Yaml |none | 5|acc |0.5602|± |0.0339| | - machine_learning |Yaml |none | 5|acc |0.4643|± |0.0473| | Groups |Version|Filter|n-shot|Metric|Value | |Stderr| |------------------|-------|------|-----:|------|-----:|---|-----:| |mmlu |N/A |none | 0|acc |0.6513|± |0.1221| | - humanities |N/A |none | 5|acc |0.6077|± |0.1185| | - other |N/A |none | 5|acc |0.7116|± |0.0939| | - social_sciences|N/A |none | 5|acc |0.7537|± |0.0681| | - stem |N/A |none | 5|acc |0.5569|± |0.1380| ``` ## Citations to [Upstage.AI](https://huggingface.co/upstage) for its awesome base model, this is merely a UNA of it. It can only refine what its already in there :) If you find UNA-SOLAR useful, cite and support the authors.