Model Card for nano-phi-192M-v0.1
This is a continual effort from kenhktsui/nano-phi-115M-v0.1.
The model is not aligned.
Major differences:
- bigger tokenizer's vocab size
- addition of HuggingFaceTB/cosmopedia as training dataset
- training token: 19B vs 7B
How to use
To use the model, you will need transformer version >= 4.37.2
pip install transformers>=4.37.2
# Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("text-generation", model="kenhktsui/nano-phi-192M-v0.1")
pipe("I am a machine learning researcher. I work on", max_new_tokens=50, repetition_penalty=10.0)
Some metrics
- model
- hidden_size: 768
- num_key_value_heads: 8 (grouped query attention)
- num_attention_heads: 24
- num_hidden_layers: 6
- context length: 1024
- total params: 192M
- training:
- global steps: 36,000
Open LLM Leaderboard Evaluation Results
Metric | kenhktsui/nano-phi-191M-v0.1 | kenhktsui/nano-phi-115M-v0.1 | microsoft/phi-2 (Reproduced) |
---|---|---|---|
Avg. | 29.24 | 28.68 | 61.53 |
ARC (25-shot) | 24.15 | 21.93 | 61.52 |
HellaSwag (10-shot) | 29.99 | 27.87 | 75.13 |
MMLU (5-shot) | 25.46 | 25.30 | 58.23 |
TruthfulQA (0-shot) | 44.30 | 46.01 | 44.46 |
Winogrande (5-shot) | 51.54 | 50.99 | 74.51 |
GSM8K (5-shot) | 0.0 | 0.0 | 55.34 |
Details:
hf-causal-experimental (pretrained=/content/lm-evaluation-harness/artifacts/model-9gh18vfl:v25,use_accelerate=false,trust_remote_code=True), limit: None, provide_description: False, num_fewshot: 0, batch_size: 8
Task | Version | Metric | Value | Stderr | |
---|---|---|---|---|---|
arc_easy | 0 | acc | 0.4596 | ± | 0.0102 |
acc_norm | 0.4070 | ± | 0.0101 |
hf-causal-experimental (pretrained=/content/lm-evaluation-harness/artifacts/model-9gh18vfl:v25,use_accelerate=false,trust_remote_code=True), limit: None, provide_description: False, num_fewshot: 25, batch_size: 8
Task | Version | Metric | Value | Stderr | |
---|---|---|---|---|---|
arc_challenge | 0 | acc | 0.1911 | ± | 0.0115 |
acc_norm | 0.2415 | ± | 0.0125 |
hf-causal-experimental (pretrained=/content/lm-evaluation-harness/artifacts/model-9gh18vfl:v25,use_accelerate=false,trust_remote_code=True), limit: None, provide_description: False, num_fewshot: 10, batch_size: 8
Task | Version | Metric | Value | Stderr | |
---|---|---|---|---|---|
hellaswag | 0 | acc | 0.2833 | ± | 0.0045 |
acc_norm | 0.2999 | ± | 0.0046 |
hf-causal-experimental (pretrained=/content/lm-evaluation-harness/artifacts/model-9gh18vfl:v25,use_accelerate=false,trust_remote_code=True), limit: None, provide_description: False, num_fewshot: 0, batch_size: 8
Task | Version | Metric | Value | Stderr | |
---|---|---|---|---|---|
truthfulqa_mc | 1 | mc1 | 0.2583 | ± | 0.0153 |
mc2 | 0.4430 | ± | 0.0152 |
hf-causal-experimental (pretrained=/content/lm-evaluation-harness/artifacts/model-9gh18vfl:v25,use_accelerate=false,trust_remote_code=True), limit: None, provide_description: False, num_fewshot: 5, batch_size: 8
Task | Version | Metric | Value | Stderr | |
---|---|---|---|---|---|
hendrycksTest-abstract_algebra | 1 | acc | 0.2200 | ± | 0.0416 |
acc_norm | 0.2200 | ± | 0.0416 | ||
hendrycksTest-anatomy | 1 | acc | 0.2593 | ± | 0.0379 |
acc_norm | 0.2593 | ± | 0.0379 | ||
hendrycksTest-astronomy | 1 | acc | 0.1711 | ± | 0.0306 |
acc_norm | 0.1711 | ± | 0.0306 | ||
hendrycksTest-business_ethics | 1 | acc | 0.2400 | ± | 0.0429 |
acc_norm | 0.2400 | ± | 0.0429 | ||
hendrycksTest-clinical_knowledge | 1 | acc | 0.2566 | ± | 0.0269 |
acc_norm | 0.2566 | ± | 0.0269 | ||
hendrycksTest-college_biology | 1 | acc | 0.2639 | ± | 0.0369 |
acc_norm | 0.2639 | ± | 0.0369 | ||
hendrycksTest-college_chemistry | 1 | acc | 0.1800 | ± | 0.0386 |
acc_norm | 0.1800 | ± | 0.0386 | ||
hendrycksTest-college_computer_science | 1 | acc | 0.3300 | ± | 0.0473 |
acc_norm | 0.3300 | ± | 0.0473 | ||
hendrycksTest-college_mathematics | 1 | acc | 0.3000 | ± | 0.0461 |
acc_norm | 0.3000 | ± | 0.0461 | ||
hendrycksTest-college_medicine | 1 | acc | 0.2023 | ± | 0.0306 |
acc_norm | 0.2023 | ± | 0.0306 | ||
hendrycksTest-college_physics | 1 | acc | 0.2843 | ± | 0.0449 |
acc_norm | 0.2843 | ± | 0.0449 | ||
hendrycksTest-computer_security | 1 | acc | 0.2200 | ± | 0.0416 |
acc_norm | 0.2200 | ± | 0.0416 | ||
hendrycksTest-conceptual_physics | 1 | acc | 0.2511 | ± | 0.0283 |
acc_norm | 0.2511 | ± | 0.0283 | ||
hendrycksTest-econometrics | 1 | acc | 0.2807 | ± | 0.0423 |
acc_norm | 0.2807 | ± | 0.0423 | ||
hendrycksTest-electrical_engineering | 1 | acc | 0.2897 | ± | 0.0378 |
acc_norm | 0.2897 | ± | 0.0378 | ||
hendrycksTest-elementary_mathematics | 1 | acc | 0.2804 | ± | 0.0231 |
acc_norm | 0.2804 | ± | 0.0231 | ||
hendrycksTest-formal_logic | 1 | acc | 0.2143 | ± | 0.0367 |
acc_norm | 0.2143 | ± | 0.0367 | ||
hendrycksTest-global_facts | 1 | acc | 0.1700 | ± | 0.0378 |
acc_norm | 0.1700 | ± | 0.0378 | ||
hendrycksTest-high_school_biology | 1 | acc | 0.3226 | ± | 0.0266 |
acc_norm | 0.3226 | ± | 0.0266 | ||
hendrycksTest-high_school_chemistry | 1 | acc | 0.2759 | ± | 0.0314 |
acc_norm | 0.2759 | ± | 0.0314 | ||
hendrycksTest-high_school_computer_science | 1 | acc | 0.2700 | ± | 0.0446 |
acc_norm | 0.2700 | ± | 0.0446 | ||
hendrycksTest-high_school_european_history | 1 | acc | 0.2606 | ± | 0.0343 |
acc_norm | 0.2606 | ± | 0.0343 | ||
hendrycksTest-high_school_geography | 1 | acc | 0.3081 | ± | 0.0329 |
acc_norm | 0.3081 | ± | 0.0329 | ||
hendrycksTest-high_school_government_and_politics | 1 | acc | 0.3627 | ± | 0.0347 |
acc_norm | 0.3627 | ± | 0.0347 | ||
hendrycksTest-high_school_macroeconomics | 1 | acc | 0.2641 | ± | 0.0224 |
acc_norm | 0.2641 | ± | 0.0224 | ||
hendrycksTest-high_school_mathematics | 1 | acc | 0.2630 | ± | 0.0268 |
acc_norm | 0.2630 | ± | 0.0268 | ||
hendrycksTest-high_school_microeconomics | 1 | acc | 0.3403 | ± | 0.0308 |
acc_norm | 0.3403 | ± | 0.0308 | ||
hendrycksTest-high_school_physics | 1 | acc | 0.3113 | ± | 0.0378 |
acc_norm | 0.3113 | ± | 0.0378 | ||
hendrycksTest-high_school_psychology | 1 | acc | 0.2716 | ± | 0.0191 |
acc_norm | 0.2716 | ± | 0.0191 | ||
hendrycksTest-high_school_statistics | 1 | acc | 0.4491 | ± | 0.0339 |
acc_norm | 0.4491 | ± | 0.0339 | ||
hendrycksTest-high_school_us_history | 1 | acc | 0.2402 | ± | 0.0300 |
acc_norm | 0.2402 | ± | 0.0300 | ||
hendrycksTest-high_school_world_history | 1 | acc | 0.2363 | ± | 0.0277 |
acc_norm | 0.2363 | ± | 0.0277 | ||
hendrycksTest-human_aging | 1 | acc | 0.2197 | ± | 0.0278 |
acc_norm | 0.2197 | ± | 0.0278 | ||
hendrycksTest-human_sexuality | 1 | acc | 0.2824 | ± | 0.0395 |
acc_norm | 0.2824 | ± | 0.0395 | ||
hendrycksTest-international_law | 1 | acc | 0.2479 | ± | 0.0394 |
acc_norm | 0.2479 | ± | 0.0394 | ||
hendrycksTest-jurisprudence | 1 | acc | 0.2037 | ± | 0.0389 |
acc_norm | 0.2037 | ± | 0.0389 | ||
hendrycksTest-logical_fallacies | 1 | acc | 0.2393 | ± | 0.0335 |
acc_norm | 0.2393 | ± | 0.0335 | ||
hendrycksTest-machine_learning | 1 | acc | 0.1875 | ± | 0.0370 |
acc_norm | 0.1875 | ± | 0.0370 | ||
hendrycksTest-management | 1 | acc | 0.2039 | ± | 0.0399 |
acc_norm | 0.2039 | ± | 0.0399 | ||
hendrycksTest-marketing | 1 | acc | 0.1795 | ± | 0.0251 |
acc_norm | 0.1795 | ± | 0.0251 | ||
hendrycksTest-medical_genetics | 1 | acc | 0.3000 | ± | 0.0461 |
acc_norm | 0.3000 | ± | 0.0461 | ||
hendrycksTest-miscellaneous | 1 | acc | 0.2644 | ± | 0.0158 |
acc_norm | 0.2644 | ± | 0.0158 | ||
hendrycksTest-moral_disputes | 1 | acc | 0.2225 | ± | 0.0224 |
acc_norm | 0.2225 | ± | 0.0224 | ||
hendrycksTest-moral_scenarios | 1 | acc | 0.2726 | ± | 0.0149 |
acc_norm | 0.2726 | ± | 0.0149 | ||
hendrycksTest-nutrition | 1 | acc | 0.2353 | ± | 0.0243 |
acc_norm | 0.2353 | ± | 0.0243 | ||
hendrycksTest-philosophy | 1 | acc | 0.2283 | ± | 0.0238 |
acc_norm | 0.2283 | ± | 0.0238 | ||
hendrycksTest-prehistory | 1 | acc | 0.2099 | ± | 0.0227 |
acc_norm | 0.2099 | ± | 0.0227 | ||
hendrycksTest-professional_accounting | 1 | acc | 0.2411 | ± | 0.0255 |
acc_norm | 0.2411 | ± | 0.0255 | ||
hendrycksTest-professional_law | 1 | acc | 0.2458 | ± | 0.0110 |
acc_norm | 0.2458 | ± | 0.0110 | ||
hendrycksTest-professional_medicine | 1 | acc | 0.3897 | ± | 0.0296 |
acc_norm | 0.3897 | ± | 0.0296 | ||
hendrycksTest-professional_psychology | 1 | acc | 0.2141 | ± | 0.0166 |
acc_norm | 0.2141 | ± | 0.0166 | ||
hendrycksTest-public_relations | 1 | acc | 0.1818 | ± | 0.0369 |
acc_norm | 0.1818 | ± | 0.0369 | ||
hendrycksTest-security_studies | 1 | acc | 0.2490 | ± | 0.0277 |
acc_norm | 0.2490 | ± | 0.0277 | ||
hendrycksTest-sociology | 1 | acc | 0.2537 | ± | 0.0308 |
acc_norm | 0.2537 | ± | 0.0308 | ||
hendrycksTest-us_foreign_policy | 1 | acc | 0.2900 | ± | 0.0456 |
acc_norm | 0.2900 | ± | 0.0456 | ||
hendrycksTest-virology | 1 | acc | 0.1807 | ± | 0.0300 |
acc_norm | 0.1807 | ± | 0.0300 | ||
hendrycksTest-world_religions | 1 | acc | 0.1813 | ± | 0.0295 |
acc_norm | 0.1813 | ± | 0.0295 |
hf-causal-experimental (pretrained=/content/lm-evaluation-harness/artifacts/model-9gh18vfl:v25,use_accelerate=false,trust_remote_code=True), limit: None, provide_description: False, num_fewshot: 5, batch_size: 8
Task | Version | Metric | Value | Stderr | |
---|---|---|---|---|---|
winogrande | 0 | acc | 0.5154 | ± | 0.014 |
hf-causal-experimental (pretrained=/content/lm-evaluation-harness/artifacts/model-9gh18vfl:v25,use_accelerate=false,trust_remote_code=True), limit: None, provide_description: False, num_fewshot: 5, batch_size: 8
Task | Version | Metric | Value | Stderr | |
---|---|---|---|---|---|
gsm8k | 0 | acc | 0 | ± | 0 |
- Downloads last month
- 6
Datasets used to train kenhktsui/nano-phi-192M-v0.1
Collection including kenhktsui/nano-phi-192M-v0.1
Evaluation results
- normalized accuracy on AI2 Reasoning Challenge (25-Shot)test set Open LLM Leaderboard24.150
- normalized accuracy on HellaSwag (10-Shot)validation set Open LLM Leaderboard29.990
- accuracy on MMLU (5-Shot)test set Open LLM Leaderboard25.460
- mc2 on TruthfulQA (0-shot)validation set Open LLM Leaderboard44.300
- accuracy on Winogrande (5-shot)validation set Open LLM Leaderboard51.450
- accuracy on GSM8k (5-shot)test set Open LLM Leaderboard0.000