Edit model card

tangled-llama-a-128k-base-v0.1

logo

A pretrained language model based on the Llama model with about 62.9M parameters. This model has been trained on 10.6B (10,630,121,844) tokens from more than 31.3M (31,383,840) dataset rows.

This model isn't designed for immediate use but rather for Continued Pretraining and Finetuning on a downstream task. While it can handle a context length of up to 128K (131,072) tokens, it was pretrained with sequences of 2K (2048) tokens.

The objective is to streamline the cognitive or reasoning core, eliminating any redundant knowledge from the model.

loss, val_loss

val_ppl

epoch

learning_rate

Pretrain Evaluation

lm-evaluation-harness

litgpt evaluate --tasks 'hellaswag,gsm8k,truthfulqa_mc2,mmlu,winogrande,arc_challenge' --out_dir 'evaluate-quick/' --batch_size 4 --dtype 'bfloat16' out/pretrain/final/
Tasks Version Filter n-shot Metric Value Stderr
arc_challenge 1 none 0 acc 0.2176 ± 0.0121
none 0 acc_norm 0.2560 ± 0.0128
gsm8k 3 flexible-extract 5 exact_match 0.0190 ± 0.0038
strict-match 5 exact_match 0.0000 ± 0.0000
hellaswag 1 none 0 acc 0.2618 ± 0.0044
none 0 acc_norm 0.2592 ± 0.0044
mmlu 2 none acc 0.2464 ± 0.0036
- humanities 2 none acc 0.2485 ± 0.0063
- formal_logic 1 none 0 acc 0.3175 ± 0.0416
- high_school_european_history 1 none 0 acc 0.2364 ± 0.0332
- high_school_us_history 1 none 0 acc 0.2402 ± 0.0300
- high_school_world_history 1 none 0 acc 0.2785 ± 0.0292
- international_law 1 none 0 acc 0.2314 ± 0.0385
- jurisprudence 1 none 0 acc 0.2407 ± 0.0413
- logical_fallacies 1 none 0 acc 0.2086 ± 0.0319
- moral_disputes 1 none 0 acc 0.2081 ± 0.0219
- moral_scenarios 1 none 0 acc 0.2693 ± 0.0148
- philosophy 1 none 0 acc 0.1961 ± 0.0226
- prehistory 1 none 0 acc 0.2284 ± 0.0234
- professional_law 1 none 0 acc 0.2529 ± 0.0111
- world_religions 1 none 0 acc 0.2982 ± 0.0351
- other 2 none acc 0.2536 ± 0.0078
- business_ethics 1 none 0 acc 0.2700 ± 0.0446
- clinical_knowledge 1 none 0 acc 0.2264 ± 0.0258
- college_medicine 1 none 0 acc 0.2312 ± 0.0321
- global_facts 1 none 0 acc 0.1500 ± 0.0359
- human_aging 1 none 0 acc 0.2242 ± 0.0280
- management 1 none 0 acc 0.1942 ± 0.0392
- marketing 1 none 0 acc 0.3034 ± 0.0301
- medical_genetics 1 none 0 acc 0.2200 ± 0.0416
- miscellaneous 1 none 0 acc 0.2401 ± 0.0153
- nutrition 1 none 0 acc 0.2255 ± 0.0239
- professional_accounting 1 none 0 acc 0.2730 ± 0.0266
- professional_medicine 1 none 0 acc 0.4081 ± 0.0299
- virology 1 none 0 acc 0.2289 ± 0.0327
- social sciences 2 none acc 0.2535 ± 0.0079
- econometrics 1 none 0 acc 0.2368 ± 0.0400
- high_school_geography 1 none 0 acc 0.2323 ± 0.0301
- high_school_government_and_politics 1 none 0 acc 0.2539 ± 0.0314
- high_school_macroeconomics 1 none 0 acc 0.2436 ± 0.0218
- high_school_microeconomics 1 none 0 acc 0.2311 ± 0.0274
- high_school_psychology 1 none 0 acc 0.2550 ± 0.0187
- human_sexuality 1 none 0 acc 0.2824 ± 0.0395
- professional_psychology 1 none 0 acc 0.2484 ± 0.0175
- public_relations 1 none 0 acc 0.2727 ± 0.0427
- security_studies 1 none 0 acc 0.2939 ± 0.0292
- sociology 1 none 0 acc 0.2488 ± 0.0306
- us_foreign_policy 1 none 0 acc 0.2800 ± 0.0451
- stem 2 none acc 0.2293 ± 0.0075
- abstract_algebra 1 none 0 acc 0.2200 ± 0.0416
- anatomy 1 none 0 acc 0.2519 ± 0.0375
- astronomy 1 none 0 acc 0.2697 ± 0.0361
- college_biology 1 none 0 acc 0.2500 ± 0.0362
- college_chemistry 1 none 0 acc 0.2400 ± 0.0429
- college_computer_science 1 none 0 acc 0.2800 ± 0.0451
- college_mathematics 1 none 0 acc 0.2000 ± 0.0402
- college_physics 1 none 0 acc 0.2647 ± 0.0439
- computer_security 1 none 0 acc 0.1900 ± 0.0394
- conceptual_physics 1 none 0 acc 0.2340 ± 0.0277
- electrical_engineering 1 none 0 acc 0.2414 ± 0.0357
- elementary_mathematics 1 none 0 acc 0.1931 ± 0.0203
- high_school_biology 1 none 0 acc 0.2323 ± 0.0240
- high_school_chemistry 1 none 0 acc 0.2266 ± 0.0295
- high_school_computer_science 1 none 0 acc 0.2400 ± 0.0429
- high_school_mathematics 1 none 0 acc 0.2037 ± 0.0246
- high_school_physics 1 none 0 acc 0.2185 ± 0.0337
- high_school_statistics 1 none 0 acc 0.1898 ± 0.0267
- machine_learning 1 none 0 acc 0.3393 ± 0.0449
truthfulqa_mc2 2 none 0 acc 0.5061 ± 0.0167
winogrande 1 none 0 acc 0.4933 ± 0.0141
Groups Version Filter n-shot Metric Value Stderr
mmlu 2 none acc 0.2464 ± 0.0036
- humanities 2 none acc 0.2485 ± 0.0063
- other 2 none acc 0.2536 ± 0.0078
- social sciences 2 none acc 0.2535 ± 0.0079
- stem 2 none acc 0.2293 ± 0.0075
litgpt evaluate --tasks 'leaderboard' --out_dir 'evaluate-leaderboard/' --batch_size 4 --dtype 'bfloat16' out/pretrain/final/
Tasks Version Filter n-shot Metric Value Stderr
leaderboard N/A
- leaderboard_bbh N/A
- leaderboard_bbh_boolean_expressions 1 none 3 acc_norm 0.4600 ± 0.0316
- leaderboard_bbh_causal_judgement 1 none 3 acc_norm 0.5134 ± 0.0366
- leaderboard_bbh_date_understanding 1 none 3 acc_norm 0.1360 ± 0.0217
- leaderboard_bbh_disambiguation_qa 1 none 3 acc_norm 0.2960 ± 0.0289
- leaderboard_bbh_formal_fallacies 1 none 3 acc_norm 0.4760 ± 0.0316
- leaderboard_bbh_geometric_shapes 1 none 3 acc_norm 0.0800 ± 0.0172
- leaderboard_bbh_hyperbaton 1 none 3 acc_norm 0.5120 ± 0.0317
- leaderboard_bbh_logical_deduction_five_objects 1 none 3 acc_norm 0.1760 ± 0.0241
- leaderboard_bbh_logical_deduction_seven_objects 1 none 3 acc_norm 0.1320 ± 0.0215
- leaderboard_bbh_logical_deduction_three_objects 1 none 3 acc_norm 0.3160 ± 0.0295
- leaderboard_bbh_movie_recommendation 1 none 3 acc_norm 0.2480 ± 0.0274
- leaderboard_bbh_navigate 1 none 3 acc_norm 0.4200 ± 0.0313
- leaderboard_bbh_object_counting 1 none 3 acc_norm 0.0360 ± 0.0118
- leaderboard_bbh_penguins_in_a_table 1 none 3 acc_norm 0.1986 ± 0.0331
- leaderboard_bbh_reasoning_about_colored_objects 1 none 3 acc_norm 0.0520 ± 0.0141
- leaderboard_bbh_ruin_names 1 none 3 acc_norm 0.2760 ± 0.0283
- leaderboard_bbh_salient_translation_error_detection 1 none 3 acc_norm 0.1400 ± 0.0220
- leaderboard_bbh_snarks 1 none 3 acc_norm 0.4326 ± 0.0372
- leaderboard_bbh_sports_understanding 1 none 3 acc_norm 0.4600 ± 0.0316
- leaderboard_bbh_temporal_sequences 1 none 3 acc_norm 0.2680 ± 0.0281
- leaderboard_bbh_tracking_shuffled_objects_five_objects 1 none 3 acc_norm 0.2040 ± 0.0255
- leaderboard_bbh_tracking_shuffled_objects_seven_objects 1 none 3 acc_norm 0.1640 ± 0.0235
- leaderboard_bbh_tracking_shuffled_objects_three_objects 1 none 3 acc_norm 0.3840 ± 0.0308
- leaderboard_bbh_web_of_lies 1 none 3 acc_norm 0.4880 ± 0.0317
- leaderboard_gpqa N/A
- leaderboard_gpqa_diamond 1 none 0 acc_norm 0.2778 ± 0.0319
- leaderboard_gpqa_extended 1 none 0 acc_norm 0.2766 ± 0.0192
- leaderboard_gpqa_main 1 none 0 acc_norm 0.2031 ± 0.0190
- leaderboard_ifeval 3 none 0 inst_level_loose_acc 0.1811 ± N/A
none 0 inst_level_strict_acc 0.1715 ± N/A
none 0 prompt_level_loose_acc 0.1091 ± 0.0134
none 0 prompt_level_strict_acc 0.1035 ± 0.0131
- leaderboard_math_hard N/A
- leaderboard_math_algebra_hard 1 none 4 exact_match 0.0000 ± 0
- leaderboard_math_counting_and_prob_hard 1 none 4 exact_match 0.0000 ± 0
- leaderboard_math_geometry_hard 1 none 4 exact_match 0.0000 ± 0
- leaderboard_math_intermediate_algebra_hard 1 none 4 exact_match 0.0000 ± 0
- leaderboard_math_num_theory_hard 1 none 4 exact_match 0.0000 ± 0
- leaderboard_math_prealgebra_hard 1 none 4 exact_match 0.0000 ± 0
- leaderboard_math_precalculus_hard 1 none 4 exact_match 0.0000 ± 0
- leaderboard_mmlu_pro 0.1 none 5 acc 0.1169 ± 0.0029
- leaderboard_musr N/A
- leaderboard_musr_murder_mysteries 1 none 0 acc_norm 0.5080 ± 0.0317
- leaderboard_musr_object_placements 1 none 0 acc_norm 0.3008 ± 0.0287
- leaderboard_musr_team_allocation 1 none 0 acc_norm 0.3760 ± 0.0307
litgpt evaluate --tasks 'gsm8k,mathqa' --out_dir 'evaluate-math/' --batch_size 4 --dtype 'bfloat16' out/pretrain/final/
Tasks Version Filter n-shot Metric Value Stderr
gsm8k 3 flexible-extract 5 exact_match 0.0190 ± 0.0038
strict-match 5 exact_match 0.0000 ± 0.0000
mathqa 1 none 0 acc 0.2060 ± 0.0074
none 0 acc_norm 0.2057 ± 0.0074
litgpt evaluate --tasks 'mmlu,mmlu_pro' --out_dir 'evaluate-mmlu/' --batch_size 4 --dtype 'bfloat16' out/pretrain/final/
Tasks Version Filter n-shot Metric Value Stderr
mmlu 2 none acc 0.2459 ± 0.0036
- humanities 2 none acc 0.2480 ± 0.0063
- formal_logic 1 none 0 acc 0.3175 ± 0.0416
- high_school_european_history 1 none 0 acc 0.2424 ± 0.0335
- high_school_us_history 1 none 0 acc 0.2402 ± 0.0300
- high_school_world_history 1 none 0 acc 0.2743 ± 0.0290
- international_law 1 none 0 acc 0.2314 ± 0.0385
- jurisprudence 1 none 0 acc 0.2315 ± 0.0408
- logical_fallacies 1 none 0 acc 0.2209 ± 0.0326
- moral_disputes 1 none 0 acc 0.2081 ± 0.0219
- moral_scenarios 1 none 0 acc 0.2670 ± 0.0148
- philosophy 1 none 0 acc 0.2090 ± 0.0231
- prehistory 1 none 0 acc 0.2160 ± 0.0229
- professional_law 1 none 0 acc 0.2516 ± 0.0111
- world_religions 1 none 0 acc 0.3041 ± 0.0353
- other 2 none acc 0.2549 ± 0.0078
- business_ethics 1 none 0 acc 0.2700 ± 0.0446
- clinical_knowledge 1 none 0 acc 0.2264 ± 0.0258
- college_medicine 1 none 0 acc 0.2428 ± 0.0327
- global_facts 1 none 0 acc 0.1600 ± 0.0368
- human_aging 1 none 0 acc 0.2242 ± 0.0280
- management 1 none 0 acc 0.1845 ± 0.0384
- marketing 1 none 0 acc 0.2949 ± 0.0299
- medical_genetics 1 none 0 acc 0.2200 ± 0.0416
- miscellaneous 1 none 0 acc 0.2478 ± 0.0154
- nutrition 1 none 0 acc 0.2353 ± 0.0243
- professional_accounting 1 none 0 acc 0.2553 ± 0.0260
- professional_medicine 1 none 0 acc 0.4118 ± 0.0299
- virology 1 none 0 acc 0.2229 ± 0.0324
- social sciences 2 none acc 0.2525 ± 0.0078
- econometrics 1 none 0 acc 0.2368 ± 0.0400
- high_school_geography 1 none 0 acc 0.2172 ± 0.0294
- high_school_government_and_politics 1 none 0 acc 0.2539 ± 0.0314
- high_school_macroeconomics 1 none 0 acc 0.2410 ± 0.0217
- high_school_microeconomics 1 none 0 acc 0.2311 ± 0.0274
- high_school_psychology 1 none 0 acc 0.2495 ± 0.0186
- human_sexuality 1 none 0 acc 0.2824 ± 0.0395
- professional_psychology 1 none 0 acc 0.2565 ± 0.0177
- public_relations 1 none 0 acc 0.2636 ± 0.0422
- security_studies 1 none 0 acc 0.2898 ± 0.0290
- sociology 1 none 0 acc 0.2537 ± 0.0308
- us_foreign_policy 1 none 0 acc 0.2800 ± 0.0451
- stem 2 none acc 0.2274 ± 0.0075
- abstract_algebra 1 none 0 acc 0.2200 ± 0.0416
- anatomy 1 none 0 acc 0.2444 ± 0.0371
- astronomy 1 none 0 acc 0.2697 ± 0.0361
- college_biology 1 none 0 acc 0.2500 ± 0.0362
- college_chemistry 1 none 0 acc 0.2100 ± 0.0409
- college_computer_science 1 none 0 acc 0.2800 ± 0.0451
- college_mathematics 1 none 0 acc 0.1900 ± 0.0394
- college_physics 1 none 0 acc 0.2549 ± 0.0434
- computer_security 1 none 0 acc 0.1900 ± 0.0394
- conceptual_physics 1 none 0 acc 0.2298 ± 0.0275
- electrical_engineering 1 none 0 acc 0.2483 ± 0.0360
- elementary_mathematics 1 none 0 acc 0.1931 ± 0.0203
- high_school_biology 1 none 0 acc 0.2258 ± 0.0238
- high_school_chemistry 1 none 0 acc 0.2217 ± 0.0292
- high_school_computer_science 1 none 0 acc 0.2400 ± 0.0429
- high_school_mathematics 1 none 0 acc 0.2074 ± 0.0247
- high_school_physics 1 none 0 acc 0.2185 ± 0.0337
- high_school_statistics 1 none 0 acc 0.1991 ± 0.0272
- machine_learning 1 none 0 acc 0.3393 ± 0.0449
mmlu_pro 2 custom-extract exact_match 0.0000 ± 0.0000
- biology 1 custom-extract 5 exact_match 0.0000 ± 0.0000
- business 1 custom-extract 5 exact_match 0.0000 ± 0.0000
- chemistry 1 custom-extract 5 exact_match 0.0000 ± 0.0000
- computer_science 1 custom-extract 5 exact_match 0.0000 ± 0.0000
- economics 1 custom-extract 5 exact_match 0.0000 ± 0.0000
- engineering 1 custom-extract 5 exact_match 0.0000 ± 0.0000
- health 1 custom-extract 5 exact_match 0.0000 ± 0.0000
- history 1 custom-extract 5 exact_match 0.0000 ± 0.0000
- law 1 custom-extract 5 exact_match 0.0000 ± 0.0000
- math 1 custom-extract 5 exact_match 0.0000 ± 0.0000
- other 1 custom-extract 5 exact_match 0.0000 ± 0.0000
- philosophy 1 custom-extract 5 exact_match 0.0000 ± 0.0000
- physics 1 custom-extract 5 exact_match 0.0000 ± 0.0000
- psychology 1 custom-extract 5 exact_match 0.0000 ± 0.0000
Groups Version Filter n-shot Metric Value Stderr
mmlu 2 none acc 0.2459 ± 0.0036
- humanities 2 none acc 0.2480 ± 0.0063
- other 2 none acc 0.2549 ± 0.0078
- social sciences 2 none acc 0.2525 ± 0.0078
- stem 2 none acc 0.2274 ± 0.0075
mmlu_pro 2 custom-extract exact_match 0.0000 ± 0.0000
litgpt evaluate --tasks 'arc_challenge,boolq,gpqa,hellaswag,openbookqa,piqa,truthfulqa_mc2,winogrande' --out_dir 'evaluate-reasoning/' --batch_size 4 --dtype 'bfloat16' out/pretrain/final/
Tasks Version Filter n-shot Metric Value Stderr
arc_challenge 1 none 0 acc 0.2176 ± 0.0121
none 0 acc_norm 0.2560 ± 0.0128
boolq 2 none 0 acc 0.3783 ± 0.0085
gpqa_diamond_cot_n_shot 2 flexible-extract 0 exact_match 0.0051 ± 0.0051
strict-match 0 exact_match 0.0000 ± 0.0000
gpqa_diamond_cot_zeroshot 1 flexible-extract 0 exact_match 0.0051 ± 0.0051
strict-match 0 exact_match 0.0000 ± 0.0000
gpqa_diamond_generative_n_shot 2 flexible-extract 0 exact_match 0.0051 ± 0.0051
strict-match 0 exact_match 0.0000 ± 0.0000
gpqa_diamond_n_shot 2 none 0 acc 0.1970 ± 0.0283
none 0 acc_norm 0.1970 ± 0.0283
gpqa_diamond_zeroshot 1 none 0 acc 0.2727 ± 0.0317
none 0 acc_norm 0.2727 ± 0.0317
gpqa_extended_cot_n_shot 2 flexible-extract 0 exact_match 0.0018 ± 0.0018
strict-match 0 exact_match 0.0000 ± 0.0000
gpqa_extended_cot_zeroshot 1 flexible-extract 0 exact_match 0.0037 ± 0.0026
strict-match 0 exact_match 0.0000 ± 0.0000
gpqa_extended_generative_n_shot 2 flexible-extract 0 exact_match 0.0073 ± 0.0037
strict-match 0 exact_match 0.0000 ± 0.0000
gpqa_extended_n_shot 2 none 0 acc 0.2564 ± 0.0187
none 0 acc_norm 0.2564 ± 0.0187
gpqa_extended_zeroshot 1 none 0 acc 0.2802 ± 0.0192
none 0 acc_norm 0.2802 ± 0.0192
gpqa_main_cot_n_shot 2 flexible-extract 0 exact_match 0.0000 ± 0.0000
strict-match 0 exact_match 0.0000 ± 0.0000
gpqa_main_cot_zeroshot 1 flexible-extract 0 exact_match 0.0000 ± 0.0000
strict-match 0 exact_match 0.0000 ± 0.0000
gpqa_main_generative_n_shot 2 flexible-extract 0 exact_match 0.0089 ± 0.0044
strict-match 0 exact_match 0.0000 ± 0.0000
gpqa_main_n_shot 2 none 0 acc 0.2478 ± 0.0204
none 0 acc_norm 0.2478 ± 0.0204
gpqa_main_zeroshot 1 none 0 acc 0.2143 ± 0.0194
none 0 acc_norm 0.2143 ± 0.0194
hellaswag 1 none 0 acc 0.2618 ± 0.0044
none 0 acc_norm 0.2592 ± 0.0044
openbookqa 1 none 0 acc 0.1340 ± 0.0152
none 0 acc_norm 0.2340 ± 0.0190
piqa 1 none 0 acc 0.5201 ± 0.0117
none 0 acc_norm 0.5076 ± 0.0117
truthfulqa_mc2 2 none 0 acc 0.5061 ± 0.0167
winogrande 1 none 0 acc 0.4933 ± 0.0141
litgpt evaluate --tasks 'wikitext,qasper' --out_dir 'evaluate-long/' --batch_size 4 --dtype 'bfloat16' out/pretrain/final/
Tasks Version Filter n-shot Metric Value Stderr
qasper_bool 1 none 0 f1 0.0000 ± 0
qasper_freeform 2 none 0 f1_abstractive 0.0036 ± 0.001
wikitext 2 none 0 bits_per_byte 3.0634 ± N/A
none 0 byte_perplexity 8.3596 ± N/A
none 0 word_perplexity 85375.3002 ± N/A
Downloads last month
43
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Datasets used to train tangledgroup/tangled-llama-a-128k-base-v0.1