Edit model card

Model Card for nano-phi-192M-v0.1

This is a continual effort from kenhktsui/nano-phi-115M-v0.1.
The model is not aligned.

Major differences:

How to use

To use the model, you will need transformer version >= 4.37.2

pip install transformers>=4.37.2
# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="kenhktsui/nano-phi-192M-v0.1")
pipe("I am a machine learning researcher. I work on", max_new_tokens=50, repetition_penalty=10.0)

Some metrics

  • model
    • hidden_size: 768
    • num_key_value_heads: 8 (grouped query attention)
    • num_attention_heads: 24
    • num_hidden_layers: 6
    • context length: 1024
    • total params: 192M
  • training:
    • global steps: 36,000

Open LLM Leaderboard Evaluation Results

Metric kenhktsui/nano-phi-191M-v0.1 kenhktsui/nano-phi-115M-v0.1 microsoft/phi-2 (Reproduced)
Avg. 29.24 28.68 61.53
ARC (25-shot) 24.15 21.93 61.52
HellaSwag (10-shot) 29.99 27.87 75.13
MMLU (5-shot) 25.46 25.30 58.23
TruthfulQA (0-shot) 44.30 46.01 44.46
Winogrande (5-shot) 51.54 50.99 74.51
GSM8K (5-shot) 0.0 0.0 55.34

Details:

hf-causal-experimental (pretrained=/content/lm-evaluation-harness/artifacts/model-9gh18vfl:v25,use_accelerate=false,trust_remote_code=True), limit: None, provide_description: False, num_fewshot: 0, batch_size: 8

Task Version Metric Value Stderr
arc_easy 0 acc 0.4596 ± 0.0102
acc_norm 0.4070 ± 0.0101

hf-causal-experimental (pretrained=/content/lm-evaluation-harness/artifacts/model-9gh18vfl:v25,use_accelerate=false,trust_remote_code=True), limit: None, provide_description: False, num_fewshot: 25, batch_size: 8

Task Version Metric Value Stderr
arc_challenge 0 acc 0.1911 ± 0.0115
acc_norm 0.2415 ± 0.0125

hf-causal-experimental (pretrained=/content/lm-evaluation-harness/artifacts/model-9gh18vfl:v25,use_accelerate=false,trust_remote_code=True), limit: None, provide_description: False, num_fewshot: 10, batch_size: 8

Task Version Metric Value Stderr
hellaswag 0 acc 0.2833 ± 0.0045
acc_norm 0.2999 ± 0.0046

hf-causal-experimental (pretrained=/content/lm-evaluation-harness/artifacts/model-9gh18vfl:v25,use_accelerate=false,trust_remote_code=True), limit: None, provide_description: False, num_fewshot: 0, batch_size: 8

Task Version Metric Value Stderr
truthfulqa_mc 1 mc1 0.2583 ± 0.0153
mc2 0.4430 ± 0.0152

hf-causal-experimental (pretrained=/content/lm-evaluation-harness/artifacts/model-9gh18vfl:v25,use_accelerate=false,trust_remote_code=True), limit: None, provide_description: False, num_fewshot: 5, batch_size: 8

Task Version Metric Value Stderr
hendrycksTest-abstract_algebra 1 acc 0.2200 ± 0.0416
acc_norm 0.2200 ± 0.0416
hendrycksTest-anatomy 1 acc 0.2593 ± 0.0379
acc_norm 0.2593 ± 0.0379
hendrycksTest-astronomy 1 acc 0.1711 ± 0.0306
acc_norm 0.1711 ± 0.0306
hendrycksTest-business_ethics 1 acc 0.2400 ± 0.0429
acc_norm 0.2400 ± 0.0429
hendrycksTest-clinical_knowledge 1 acc 0.2566 ± 0.0269
acc_norm 0.2566 ± 0.0269
hendrycksTest-college_biology 1 acc 0.2639 ± 0.0369
acc_norm 0.2639 ± 0.0369
hendrycksTest-college_chemistry 1 acc 0.1800 ± 0.0386
acc_norm 0.1800 ± 0.0386
hendrycksTest-college_computer_science 1 acc 0.3300 ± 0.0473
acc_norm 0.3300 ± 0.0473
hendrycksTest-college_mathematics 1 acc 0.3000 ± 0.0461
acc_norm 0.3000 ± 0.0461
hendrycksTest-college_medicine 1 acc 0.2023 ± 0.0306
acc_norm 0.2023 ± 0.0306
hendrycksTest-college_physics 1 acc 0.2843 ± 0.0449
acc_norm 0.2843 ± 0.0449
hendrycksTest-computer_security 1 acc 0.2200 ± 0.0416
acc_norm 0.2200 ± 0.0416
hendrycksTest-conceptual_physics 1 acc 0.2511 ± 0.0283
acc_norm 0.2511 ± 0.0283
hendrycksTest-econometrics 1 acc 0.2807 ± 0.0423
acc_norm 0.2807 ± 0.0423
hendrycksTest-electrical_engineering 1 acc 0.2897 ± 0.0378
acc_norm 0.2897 ± 0.0378
hendrycksTest-elementary_mathematics 1 acc 0.2804 ± 0.0231
acc_norm 0.2804 ± 0.0231
hendrycksTest-formal_logic 1 acc 0.2143 ± 0.0367
acc_norm 0.2143 ± 0.0367
hendrycksTest-global_facts 1 acc 0.1700 ± 0.0378
acc_norm 0.1700 ± 0.0378
hendrycksTest-high_school_biology 1 acc 0.3226 ± 0.0266
acc_norm 0.3226 ± 0.0266
hendrycksTest-high_school_chemistry 1 acc 0.2759 ± 0.0314
acc_norm 0.2759 ± 0.0314
hendrycksTest-high_school_computer_science 1 acc 0.2700 ± 0.0446
acc_norm 0.2700 ± 0.0446
hendrycksTest-high_school_european_history 1 acc 0.2606 ± 0.0343
acc_norm 0.2606 ± 0.0343
hendrycksTest-high_school_geography 1 acc 0.3081 ± 0.0329
acc_norm 0.3081 ± 0.0329
hendrycksTest-high_school_government_and_politics 1 acc 0.3627 ± 0.0347
acc_norm 0.3627 ± 0.0347
hendrycksTest-high_school_macroeconomics 1 acc 0.2641 ± 0.0224
acc_norm 0.2641 ± 0.0224
hendrycksTest-high_school_mathematics 1 acc 0.2630 ± 0.0268
acc_norm 0.2630 ± 0.0268
hendrycksTest-high_school_microeconomics 1 acc 0.3403 ± 0.0308
acc_norm 0.3403 ± 0.0308
hendrycksTest-high_school_physics 1 acc 0.3113 ± 0.0378
acc_norm 0.3113 ± 0.0378
hendrycksTest-high_school_psychology 1 acc 0.2716 ± 0.0191
acc_norm 0.2716 ± 0.0191
hendrycksTest-high_school_statistics 1 acc 0.4491 ± 0.0339
acc_norm 0.4491 ± 0.0339
hendrycksTest-high_school_us_history 1 acc 0.2402 ± 0.0300
acc_norm 0.2402 ± 0.0300
hendrycksTest-high_school_world_history 1 acc 0.2363 ± 0.0277
acc_norm 0.2363 ± 0.0277
hendrycksTest-human_aging 1 acc 0.2197 ± 0.0278
acc_norm 0.2197 ± 0.0278
hendrycksTest-human_sexuality 1 acc 0.2824 ± 0.0395
acc_norm 0.2824 ± 0.0395
hendrycksTest-international_law 1 acc 0.2479 ± 0.0394
acc_norm 0.2479 ± 0.0394
hendrycksTest-jurisprudence 1 acc 0.2037 ± 0.0389
acc_norm 0.2037 ± 0.0389
hendrycksTest-logical_fallacies 1 acc 0.2393 ± 0.0335
acc_norm 0.2393 ± 0.0335
hendrycksTest-machine_learning 1 acc 0.1875 ± 0.0370
acc_norm 0.1875 ± 0.0370
hendrycksTest-management 1 acc 0.2039 ± 0.0399
acc_norm 0.2039 ± 0.0399
hendrycksTest-marketing 1 acc 0.1795 ± 0.0251
acc_norm 0.1795 ± 0.0251
hendrycksTest-medical_genetics 1 acc 0.3000 ± 0.0461
acc_norm 0.3000 ± 0.0461
hendrycksTest-miscellaneous 1 acc 0.2644 ± 0.0158
acc_norm 0.2644 ± 0.0158
hendrycksTest-moral_disputes 1 acc 0.2225 ± 0.0224
acc_norm 0.2225 ± 0.0224
hendrycksTest-moral_scenarios 1 acc 0.2726 ± 0.0149
acc_norm 0.2726 ± 0.0149
hendrycksTest-nutrition 1 acc 0.2353 ± 0.0243
acc_norm 0.2353 ± 0.0243
hendrycksTest-philosophy 1 acc 0.2283 ± 0.0238
acc_norm 0.2283 ± 0.0238
hendrycksTest-prehistory 1 acc 0.2099 ± 0.0227
acc_norm 0.2099 ± 0.0227
hendrycksTest-professional_accounting 1 acc 0.2411 ± 0.0255
acc_norm 0.2411 ± 0.0255
hendrycksTest-professional_law 1 acc 0.2458 ± 0.0110
acc_norm 0.2458 ± 0.0110
hendrycksTest-professional_medicine 1 acc 0.3897 ± 0.0296
acc_norm 0.3897 ± 0.0296
hendrycksTest-professional_psychology 1 acc 0.2141 ± 0.0166
acc_norm 0.2141 ± 0.0166
hendrycksTest-public_relations 1 acc 0.1818 ± 0.0369
acc_norm 0.1818 ± 0.0369
hendrycksTest-security_studies 1 acc 0.2490 ± 0.0277
acc_norm 0.2490 ± 0.0277
hendrycksTest-sociology 1 acc 0.2537 ± 0.0308
acc_norm 0.2537 ± 0.0308
hendrycksTest-us_foreign_policy 1 acc 0.2900 ± 0.0456
acc_norm 0.2900 ± 0.0456
hendrycksTest-virology 1 acc 0.1807 ± 0.0300
acc_norm 0.1807 ± 0.0300
hendrycksTest-world_religions 1 acc 0.1813 ± 0.0295
acc_norm 0.1813 ± 0.0295

hf-causal-experimental (pretrained=/content/lm-evaluation-harness/artifacts/model-9gh18vfl:v25,use_accelerate=false,trust_remote_code=True), limit: None, provide_description: False, num_fewshot: 5, batch_size: 8

Task Version Metric Value Stderr
winogrande 0 acc 0.5154 ± 0.014

hf-causal-experimental (pretrained=/content/lm-evaluation-harness/artifacts/model-9gh18vfl:v25,use_accelerate=false,trust_remote_code=True), limit: None, provide_description: False, num_fewshot: 5, batch_size: 8

Task Version Metric Value Stderr
gsm8k 0 acc 0 ± 0
Downloads last month
4
Safetensors
Model size
192M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Datasets used to train kenhktsui/nano-phi-192M-v0.1

Collection including kenhktsui/nano-phi-192M-v0.1

Evaluation results