leaderboard-pr-bot's picture
Adding Evaluation Results
31a3532 verified
|
raw
history blame
11.3 kB
metadata
language:
  - en
license: other
tags:
  - axolotl
  - instruct
  - finetune
  - chatml
  - gpt4
  - synthetic data
  - science
  - physics
  - chemistry
  - biology
  - math
  - qwen
  - qwen2
base_model: Qwen/Qwen2-7B
datasets:
  - allenai/ai2_arc
  - camel-ai/physics
  - camel-ai/chemistry
  - camel-ai/biology
  - camel-ai/math
  - metaeval/reclor
  - openbookqa
  - mandyyyyii/scibench
  - derek-thomas/ScienceQA
  - TIGER-Lab/ScienceEval
  - jondurbin/airoboros-3.2
  - LDJnr/Capybara
  - Cot-Alpaca-GPT4-From-OpenHermes-2.5
  - STEM-AI-mtl/Electrical-engineering
  - knowrohit07/saraswati-stem
  - sablo/oasst2_curated
  - lmsys/lmsys-chat-1m
  - TIGER-Lab/MathInstruct
  - bigbio/med_qa
  - meta-math/MetaMathQA-40K
  - openbookqa
  - piqa
  - metaeval/reclor
  - derek-thomas/ScienceQA
  - scibench
  - sciq
  - Open-Orca/SlimOrca
  - migtissera/Synthia-v1.3
  - TIGER-Lab/ScienceEval
  - allenai/WildChat
  - microsoft/orca-math-word-problems-200k
  - openchat/openchat_sharegpt4_dataset
  - teknium/GPTeacher-General-Instruct
  - m-a-p/CodeFeedback-Filtered-Instruction
  - totally-not-an-llm/EverythingLM-data-V3
  - HuggingFaceH4/no_robots
  - OpenAssistant/oasst_top1_2023-08-25
  - WizardLM/WizardLM_evol_instruct_70k
  - abacusai/SystemChat-1.1
  - H-D-T/Buzz-V1.2
model-index:
  - name: Einstein-v7-Qwen2-7B
    results:
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: IFEval (0-Shot)
          type: HuggingFaceH4/ifeval
          args:
            num_few_shot: 0
        metrics:
          - type: inst_level_strict_acc and prompt_level_strict_acc
            value: 41
            name: strict accuracy
        source:
          url: >-
            https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Weyaxi/Einstein-v7-Qwen2-7B
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: BBH (3-Shot)
          type: BBH
          args:
            num_few_shot: 3
        metrics:
          - type: acc_norm
            value: 32.84
            name: normalized accuracy
        source:
          url: >-
            https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Weyaxi/Einstein-v7-Qwen2-7B
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: MATH Lvl 5 (4-Shot)
          type: hendrycks/competition_math
          args:
            num_few_shot: 4
        metrics:
          - type: exact_match
            value: 15.18
            name: exact match
        source:
          url: >-
            https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Weyaxi/Einstein-v7-Qwen2-7B
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: GPQA (0-shot)
          type: Idavidrein/gpqa
          args:
            num_few_shot: 0
        metrics:
          - type: acc_norm
            value: 6.6
            name: acc_norm
        source:
          url: >-
            https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Weyaxi/Einstein-v7-Qwen2-7B
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: MuSR (0-shot)
          type: TAUR-Lab/MuSR
          args:
            num_few_shot: 0
        metrics:
          - type: acc_norm
            value: 14.06
            name: acc_norm
        source:
          url: >-
            https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Weyaxi/Einstein-v7-Qwen2-7B
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: MMLU-PRO (5-shot)
          type: TIGER-Lab/MMLU-Pro
          config: main
          split: test
          args:
            num_few_shot: 5
        metrics:
          - type: acc
            value: 34.4
            name: accuracy
        source:
          url: >-
            https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Weyaxi/Einstein-v7-Qwen2-7B
          name: Open LLM Leaderboard

image/png

🔬 Einstein-v7-Qwen2-7B

This model is a full fine-tuned version of Qwen/Qwen2-7B on diverse datasets.

This model is finetuned using 8xMI300X using axolotl.

This model has been trained using compute resources from TensorWave.

See axolotl config

axolotl version: 0.4.0

base_model: Qwen/Qwen2-7B
model_type: AutoModelForCausalLM
tokenizer_type: AutoTokenizer

load_in_8bit: false
load_in_4bit: false
strict: false

chat_template: chatml
datasets:
  - path: data/airoboros_3.2_without_contextual_slimorca_orca_sharegpt.json
    ds_type: json
    type: sharegpt
    conversation: chatml

  - path: data/allenai_wild_chat_gpt4_english_toxic_random_half_4k_sharegpt.json
    ds_type: json
    type: sharegpt
    strict: false
    conversation: chatml

  - path: data/buzz_unstacked_chosen_math_removed_filtered.json
    ds_type: json
    type: alpaca
    conversation: chatml

  - path: data/capybara_sharegpt.json
    ds_type: json
    type: sharegpt
    conversation: chatml

  - path: data/cot_alpaca_gpt4_extracted_openhermes_2.5_sharegpt.json
    ds_type: json
    type: sharegpt
    conversation: chatml

  - path: data/everythinglm-data-v3_sharegpt.json
    ds_type: json
    type: sharegpt
    strict: false
    conversation: chatml

  - path: data/gpt4_data_lmys_1m_sharegpt.json
    ds_type: json
    type: sharegpt
    conversation: chatml

  - path: data/gpteacher-instruct-special-alpaca.json
    ds_type: json
    type: gpteacher
    conversation: chatml

  - path: data/merged_all.json
    ds_type: json
    type: alpaca
    conversation: chatml

  - path: data/no_robots_sharegpt.json
    ds_type: json
    type: sharegpt
    strict: false
    conversation: chatml

  - path: data/oasst_top1_from_fusechatmixture_sharegpt.json
    ds_type: json
    type: sharegpt
    strict: false
    conversation: chatml

  - path: data/pippa_bagel_repo_3k_sharegpt.json
    ds_type: json
    type: sharegpt
    conversation: chatml

  - path: data/rpguild_quarter_alignment_lab_sharegpt.json
    ds_type: json
    type: sharegpt
    conversation: chatml

  - path: data/sharegpt_gpt4_english.json
    ds_type: json
    type: sharegpt
    conversation: chatml

  - path: data/slimorca_dedup_filtered_95k_sharegpt.json
    ds_type: json
    type: sharegpt
    conversation: chatml

  - path: data/soda_diaolog_longest_tenth_buzz_sharegpt.json
    ds_type: json
    type: sharegpt
    conversation: chatml

  - path: data/synthia-v1.3_sharegpt_12500.json
    ds_type: json
    type: sharegpt
    conversation: chatml

  - path: data/system_conversations_dolphin_sharegpt.json
    ds_type: json
    type: sharegpt
    conversation: chatml
  
dataset_prepared_path: last_run_prepared
val_set_size: 0.002

output_dir: ./Einstein-v7-Qwen2-7B-model

sequence_len: 8192
sample_packing: true
pad_to_sequence_len: true
eval_sample_packing: false

wandb_project: Einstein
wandb_entity:
wandb_watch:
wandb_name:
wandb_log_model:
hub_model_id: Weyaxi/Einstein-v7-Qwen2-7B

gradient_accumulation_steps: 4
micro_batch_size: 6
num_epochs: 2
optimizer: paged_adamw_8bit
lr_scheduler: cosine
learning_rate: 0.00001 # look

train_on_inputs: false
group_by_length: false
bf16: auto
fp16:
tf32: false

gradient_checkpointing: unsloth
gradient_checkpointing_kwargs:
   use_reentrant: true # look
early_stopping_patience:
resume_from_checkpoint:
local_rank:
logging_steps: 1
xformers_attention:
flash_attention: true

warmup_steps: 10
evals_per_epoch: 2
eval_table_size:
eval_max_new_tokens: 128
saves_per_epoch: 1
debug:

deepspeed: deepspeed_configs/zero3_bf16.json
weight_decay: 0.05
fsdp:
fsdp_config:
special_tokens:
  eos_token: "<|im_end|>"
  pad_token: "<|end_of_text|>"
tokens:
  - "<|im_start|>"
  - "<|im_end|>"

💬 Prompt Template

You can use ChatML prompt template while using the model:

ChatML

<|im_start|>system
{system}<|im_end|>
<|im_start|>user
{user}<|im_end|>
<|im_start|>assistant
{asistant}<|im_end|>

This prompt template is available as a chat template, which means you can format messages using the tokenizer.apply_chat_template() method:

messages = [
    {"role": "system", "content": "You are helpful AI asistant."},
    {"role": "user", "content": "Hello!"}
]
gen_input = tokenizer.apply_chat_template(message, return_tensors="pt")
model.generate(**gen_input)

📊 Datasets used in this model

The datasets used to train this model are listed in the metadata section of the model card.

Please note that certain datasets mentioned in the metadata may have undergone filtering based on various criteria.

The results of this filtering process and its outcomes are in a diffrent repository:

Weyaxi/sci-datasets/main

🔄 Quantizationed versions

GGUF @bartowski

ExLlamaV2 @bartowski

🎯 Open LLM Leaderboard v2 Evaluation Results

Detailed results can be found here

Metric Value
Avg. 24.01
IFEval (0-Shot) 41.00
BBH (3-Shot) 32.84
MATH Lvl 5 (4-Shot) 15.18
GPQA (0-shot) 6.60
MuSR (0-shot) 14.06
MMLU-PRO (5-shot) 34.40

📚 Some resources, discussions and reviews aboout this model

🐦 Announcement tweet:

🔍 Reddit post in r/LocalLLaMA:

🤖 Additional information about training

This model is full fine-tuned for 2 epoch.

Total number of steps was 500.

Loss graph

image/png


🤝 Acknowledgments

Thanks to all the dataset authors mentioned in the datasets section.

Thanks to axolotl for making the repository I used to make this model.

Thanks to all open source AI community.

Built with Axolotl

If you would like to support me:

☕ Buy Me a Coffee

Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric Value
Avg. 24.01
IFEval (0-Shot) 41.00
BBH (3-Shot) 32.84
MATH Lvl 5 (4-Shot) 15.18
GPQA (0-shot) 6.60
MuSR (0-shot) 14.06
MMLU-PRO (5-shot) 34.40