nano-phi-192M-v0.1 / README.md
kenhktsui's picture
Create README.md
e5c1dbe verified
metadata
language:
  - en
license: mit
library_name: transformers
inference:
  parameters:
    max_new_tokens: 64
    do_sample: true
    temperature: 0.1
    repetition_penalty: 10
    no_repeat_ngram_size: 4
    eta_cutoff: 0.0006
    renormalize_logits: true
widget:
  - text: My name is El Microondas the Wise, and
    example_title: El Microondas
  - text: Kennesaw State University is a public
    example_title: Kennesaw State University
  - text: >-
      Bungie Studios is an American video game developer. They are most famous
      for developing the award winning Halo series of video games. They also
      made Destiny. The studio was founded
    example_title: Bungie
  - text: The Mona Lisa is a world-renowned painting created by
    example_title: Mona Lisa
  - text: >-
      The Harry Potter series, written by J.K. Rowling, begins with the book
      titled
    example_title: Harry Potter Series
  - text: >-
      Question: I have cities, but no houses. I have mountains, but no trees. I
      have water, but no fish. What am I?

      Answer:
    example_title: Riddle
  - text: The process of photosynthesis involves the conversion of
    example_title: Photosynthesis
  - text: >-
      Jane went to the store to buy some groceries. She picked up apples,
      oranges, and a loaf of bread. When she got home, she realized she forgot
    example_title: Story Continuation
  - text: >-
      Problem 2: If a train leaves Station A at 9:00 AM and travels at 60 mph,
      and another train leaves Station B at 10:00 AM and travels at 80 mph, when
      will they meet if the distance between the stations is 300 miles?

      To determine
    example_title: Math Problem
  - text: In the context of computer programming, an algorithm is
    example_title: Algorithm Definition
pipeline_tag: text-generation
model-index:
  - name: nano-phi-115M-v0.1
    results:
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: AI2 Reasoning Challenge (25-Shot)
          type: ai2_arc
          config: ARC-Challenge
          split: test
          args:
            num_few_shot: 25
        metrics:
          - type: acc_norm
            value: 24.15
            name: normalized accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=kenhktsui/nano-phi-115M-v0.1
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: HellaSwag (10-Shot)
          type: hellaswag
          split: validation
          args:
            num_few_shot: 10
        metrics:
          - type: acc_norm
            value: 29.99
            name: normalized accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=kenhktsui/nano-phi-115M-v0.1
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: MMLU (5-Shot)
          type: cais/mmlu
          config: all
          split: test
          args:
            num_few_shot: 5
        metrics:
          - type: acc
            value: 25.46
            name: accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=kenhktsui/nano-phi-115M-v0.1
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: TruthfulQA (0-shot)
          type: truthful_qa
          config: multiple_choice
          split: validation
          args:
            num_few_shot: 0
        metrics:
          - type: mc2
            value: 44.3
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=kenhktsui/nano-phi-115M-v0.1
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: Winogrande (5-shot)
          type: winogrande
          config: winogrande_xl
          split: validation
          args:
            num_few_shot: 5
        metrics:
          - type: acc
            value: 51.45
            name: accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=kenhktsui/nano-phi-115M-v0.1
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: GSM8k (5-shot)
          type: gsm8k
          config: main
          split: test
          args:
            num_few_shot: 5
        metrics:
          - type: acc
            value: 0
            name: accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=kenhktsui/nano-phi-115M-v0.1
          name: Open LLM Leaderboard
datasets:
  - kenhktsui/minipile_quality_score_v1
  - kenhktsui/simple_wikipedia_LM_quality_score_v1
  - kenhktsui/refinedweb-3m_quality_score_v1
  - kenhktsui/TM-DATA_quality_score_v1
  - kenhktsui/openwebtext_quality_score_v1
  - HuggingFaceTB/cosmopedia

Model Card for nano-phi-192M-v0.1

This is a continual effort from kenhktsui/nano-phi-115M-v0.1.
The model is not aligned.

Major differences:

How to use

To use the model, you will need transformer version >= 4.37.2

pip install transformers>=4.37.2
# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="kenhktsui/nano-phi-192M-v0.1")
pipe("I am a machine learning researcher. I work on", max_new_tokens=50, repetition_penalty=10.0)

Some metrics

  • model
    • hidden_size: 768
    • num_key_value_heads: 8 (grouped query attention)
    • num_attention_heads: 24
    • num_hidden_layers: 6
    • context length: 1024
    • total params: 192M
  • training:
    • global steps: 36,000

Open LLM Leaderboard Evaluation Results

Metric kenhktsui/nano-phi-191M-v0.1 kenhktsui/nano-phi-115M-v0.1 microsoft/phi-2 (Reproduced)
Avg. 29.24 28.68 61.53
ARC (25-shot) 24.15 21.93 61.52
HellaSwag (10-shot) 29.99 27.87 75.13
MMLU (5-shot) 25.46 25.30 58.23
TruthfulQA (0-shot) 44.30 46.01 44.46
Winogrande (5-shot) 51.54 50.99 74.51
GSM8K (5-shot) 0.0 0.0 55.34

Details:

hf-causal-experimental (pretrained=/content/lm-evaluation-harness/artifacts/model-9gh18vfl:v25,use_accelerate=false,trust_remote_code=True), limit: None, provide_description: False, num_fewshot: 0, batch_size: 8

Task Version Metric Value Stderr
arc_easy 0 acc 0.4596 ± 0.0102
acc_norm 0.4070 ± 0.0101

hf-causal-experimental (pretrained=/content/lm-evaluation-harness/artifacts/model-9gh18vfl:v25,use_accelerate=false,trust_remote_code=True), limit: None, provide_description: False, num_fewshot: 25, batch_size: 8

Task Version Metric Value Stderr
arc_challenge 0 acc 0.1911 ± 0.0115
acc_norm 0.2415 ± 0.0125

hf-causal-experimental (pretrained=/content/lm-evaluation-harness/artifacts/model-9gh18vfl:v25,use_accelerate=false,trust_remote_code=True), limit: None, provide_description: False, num_fewshot: 10, batch_size: 8

Task Version Metric Value Stderr
hellaswag 0 acc 0.2833 ± 0.0045
acc_norm 0.2999 ± 0.0046

hf-causal-experimental (pretrained=/content/lm-evaluation-harness/artifacts/model-9gh18vfl:v25,use_accelerate=false,trust_remote_code=True), limit: None, provide_description: False, num_fewshot: 0, batch_size: 8

Task Version Metric Value Stderr
truthfulqa_mc 1 mc1 0.2583 ± 0.0153
mc2 0.4430 ± 0.0152

hf-causal-experimental (pretrained=/content/lm-evaluation-harness/artifacts/model-9gh18vfl:v25,use_accelerate=false,trust_remote_code=True), limit: None, provide_description: False, num_fewshot: 5, batch_size: 8

Task Version Metric Value Stderr
hendrycksTest-abstract_algebra 1 acc 0.2200 ± 0.0416
acc_norm 0.2200 ± 0.0416
hendrycksTest-anatomy 1 acc 0.2593 ± 0.0379
acc_norm 0.2593 ± 0.0379
hendrycksTest-astronomy 1 acc 0.1711 ± 0.0306
acc_norm 0.1711 ± 0.0306
hendrycksTest-business_ethics 1 acc 0.2400 ± 0.0429
acc_norm 0.2400 ± 0.0429
hendrycksTest-clinical_knowledge 1 acc 0.2566 ± 0.0269
acc_norm 0.2566 ± 0.0269
hendrycksTest-college_biology 1 acc 0.2639 ± 0.0369
acc_norm 0.2639 ± 0.0369
hendrycksTest-college_chemistry 1 acc 0.1800 ± 0.0386
acc_norm 0.1800 ± 0.0386
hendrycksTest-college_computer_science 1 acc 0.3300 ± 0.0473
acc_norm 0.3300 ± 0.0473
hendrycksTest-college_mathematics 1 acc 0.3000 ± 0.0461
acc_norm 0.3000 ± 0.0461
hendrycksTest-college_medicine 1 acc 0.2023 ± 0.0306
acc_norm 0.2023 ± 0.0306
hendrycksTest-college_physics 1 acc 0.2843 ± 0.0449
acc_norm 0.2843 ± 0.0449
hendrycksTest-computer_security 1 acc 0.2200 ± 0.0416
acc_norm 0.2200 ± 0.0416
hendrycksTest-conceptual_physics 1 acc 0.2511 ± 0.0283
acc_norm 0.2511 ± 0.0283
hendrycksTest-econometrics 1 acc 0.2807 ± 0.0423
acc_norm 0.2807 ± 0.0423
hendrycksTest-electrical_engineering 1 acc 0.2897 ± 0.0378
acc_norm 0.2897 ± 0.0378
hendrycksTest-elementary_mathematics 1 acc 0.2804 ± 0.0231
acc_norm 0.2804 ± 0.0231
hendrycksTest-formal_logic 1 acc 0.2143 ± 0.0367
acc_norm 0.2143 ± 0.0367
hendrycksTest-global_facts 1 acc 0.1700 ± 0.0378
acc_norm 0.1700 ± 0.0378
hendrycksTest-high_school_biology 1 acc 0.3226 ± 0.0266
acc_norm 0.3226 ± 0.0266
hendrycksTest-high_school_chemistry 1 acc 0.2759 ± 0.0314
acc_norm 0.2759 ± 0.0314
hendrycksTest-high_school_computer_science 1 acc 0.2700 ± 0.0446
acc_norm 0.2700 ± 0.0446
hendrycksTest-high_school_european_history 1 acc 0.2606 ± 0.0343
acc_norm 0.2606 ± 0.0343
hendrycksTest-high_school_geography 1 acc 0.3081 ± 0.0329
acc_norm 0.3081 ± 0.0329
hendrycksTest-high_school_government_and_politics 1 acc 0.3627 ± 0.0347
acc_norm 0.3627 ± 0.0347
hendrycksTest-high_school_macroeconomics 1 acc 0.2641 ± 0.0224
acc_norm 0.2641 ± 0.0224
hendrycksTest-high_school_mathematics 1 acc 0.2630 ± 0.0268
acc_norm 0.2630 ± 0.0268
hendrycksTest-high_school_microeconomics 1 acc 0.3403 ± 0.0308
acc_norm 0.3403 ± 0.0308
hendrycksTest-high_school_physics 1 acc 0.3113 ± 0.0378
acc_norm 0.3113 ± 0.0378
hendrycksTest-high_school_psychology 1 acc 0.2716 ± 0.0191
acc_norm 0.2716 ± 0.0191
hendrycksTest-high_school_statistics 1 acc 0.4491 ± 0.0339
acc_norm 0.4491 ± 0.0339
hendrycksTest-high_school_us_history 1 acc 0.2402 ± 0.0300
acc_norm 0.2402 ± 0.0300
hendrycksTest-high_school_world_history 1 acc 0.2363 ± 0.0277
acc_norm 0.2363 ± 0.0277
hendrycksTest-human_aging 1 acc 0.2197 ± 0.0278
acc_norm 0.2197 ± 0.0278
hendrycksTest-human_sexuality 1 acc 0.2824 ± 0.0395
acc_norm 0.2824 ± 0.0395
hendrycksTest-international_law 1 acc 0.2479 ± 0.0394
acc_norm 0.2479 ± 0.0394
hendrycksTest-jurisprudence 1 acc 0.2037 ± 0.0389
acc_norm 0.2037 ± 0.0389
hendrycksTest-logical_fallacies 1 acc 0.2393 ± 0.0335
acc_norm 0.2393 ± 0.0335
hendrycksTest-machine_learning 1 acc 0.1875 ± 0.0370
acc_norm 0.1875 ± 0.0370
hendrycksTest-management 1 acc 0.2039 ± 0.0399
acc_norm 0.2039 ± 0.0399
hendrycksTest-marketing 1 acc 0.1795 ± 0.0251
acc_norm 0.1795 ± 0.0251
hendrycksTest-medical_genetics 1 acc 0.3000 ± 0.0461
acc_norm 0.3000 ± 0.0461
hendrycksTest-miscellaneous 1 acc 0.2644 ± 0.0158
acc_norm 0.2644 ± 0.0158
hendrycksTest-moral_disputes 1 acc 0.2225 ± 0.0224
acc_norm 0.2225 ± 0.0224
hendrycksTest-moral_scenarios 1 acc 0.2726 ± 0.0149
acc_norm 0.2726 ± 0.0149
hendrycksTest-nutrition 1 acc 0.2353 ± 0.0243
acc_norm 0.2353 ± 0.0243
hendrycksTest-philosophy 1 acc 0.2283 ± 0.0238
acc_norm 0.2283 ± 0.0238
hendrycksTest-prehistory 1 acc 0.2099 ± 0.0227
acc_norm 0.2099 ± 0.0227
hendrycksTest-professional_accounting 1 acc 0.2411 ± 0.0255
acc_norm 0.2411 ± 0.0255
hendrycksTest-professional_law 1 acc 0.2458 ± 0.0110
acc_norm 0.2458 ± 0.0110
hendrycksTest-professional_medicine 1 acc 0.3897 ± 0.0296
acc_norm 0.3897 ± 0.0296
hendrycksTest-professional_psychology 1 acc 0.2141 ± 0.0166
acc_norm 0.2141 ± 0.0166
hendrycksTest-public_relations 1 acc 0.1818 ± 0.0369
acc_norm 0.1818 ± 0.0369
hendrycksTest-security_studies 1 acc 0.2490 ± 0.0277
acc_norm 0.2490 ± 0.0277
hendrycksTest-sociology 1 acc 0.2537 ± 0.0308
acc_norm 0.2537 ± 0.0308
hendrycksTest-us_foreign_policy 1 acc 0.2900 ± 0.0456
acc_norm 0.2900 ± 0.0456
hendrycksTest-virology 1 acc 0.1807 ± 0.0300
acc_norm 0.1807 ± 0.0300
hendrycksTest-world_religions 1 acc 0.1813 ± 0.0295
acc_norm 0.1813 ± 0.0295

hf-causal-experimental (pretrained=/content/lm-evaluation-harness/artifacts/model-9gh18vfl:v25,use_accelerate=false,trust_remote_code=True), limit: None, provide_description: False, num_fewshot: 5, batch_size: 8

Task Version Metric Value Stderr
winogrande 0 acc 0.5154 ± 0.014

hf-causal-experimental (pretrained=/content/lm-evaluation-harness/artifacts/model-9gh18vfl:v25,use_accelerate=false,trust_remote_code=True), limit: None, provide_description: False, num_fewshot: 5, batch_size: 8

Task Version Metric Value Stderr
gsm8k 0 acc 0 ± 0