mkurman's picture
Adding Evaluation Results (#1)
a01894b verified
metadata
language:
  - en
license: llama3.1
tags:
  - medit-mesh
base_model:
  - meta-llama/Llama-3.1-8B-Instruct
  - arcee-ai/Llama-3.1-SuperNova-Lite
pipeline_tag: text-generation
model-index:
  - name: Llama-3.1-MedIT-SUN-8B
    results:
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: IFEval (0-Shot)
          type: HuggingFaceH4/ifeval
          args:
            num_few_shot: 0
        metrics:
          - type: inst_level_strict_acc and prompt_level_strict_acc
            value: 78.37
            name: strict accuracy
        source:
          url: >-
            https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=meditsolutions/Llama-3.1-MedIT-SUN-8B
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: BBH (3-Shot)
          type: BBH
          args:
            num_few_shot: 3
        metrics:
          - type: acc_norm
            value: 32
            name: normalized accuracy
        source:
          url: >-
            https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=meditsolutions/Llama-3.1-MedIT-SUN-8B
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: MATH Lvl 5 (4-Shot)
          type: hendrycks/competition_math
          args:
            num_few_shot: 4
        metrics:
          - type: exact_match
            value: 20.02
            name: exact match
        source:
          url: >-
            https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=meditsolutions/Llama-3.1-MedIT-SUN-8B
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: GPQA (0-shot)
          type: Idavidrein/gpqa
          args:
            num_few_shot: 0
        metrics:
          - type: acc_norm
            value: 7.83
            name: acc_norm
        source:
          url: >-
            https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=meditsolutions/Llama-3.1-MedIT-SUN-8B
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: MuSR (0-shot)
          type: TAUR-Lab/MuSR
          args:
            num_few_shot: 0
        metrics:
          - type: acc_norm
            value: 9.64
            name: acc_norm
        source:
          url: >-
            https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=meditsolutions/Llama-3.1-MedIT-SUN-8B
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: MMLU-PRO (5-shot)
          type: TIGER-Lab/MMLU-Pro
          config: main
          split: test
          args:
            num_few_shot: 5
        metrics:
          - type: acc
            value: 32.4
            name: accuracy
        source:
          url: >-
            https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=meditsolutions/Llama-3.1-MedIT-SUN-8B
          name: Open LLM Leaderboard

Llama-3.1-MedIT-SUN-8B

Model Description

Llama-3.1-MedIT-SUN-8B is an experimental language model that leverages model merging techniques to combine the capabilities of multiple foundation models. This 8B parameter model is built upon the Llama-3.1-8B-Instruct architecture and represents an exploration in model fusion methodologies.

Key Features

  • Base Architecture: Meta's Llama-3.1-8B-Instruct
  • Parameter Count: 8 billion
  • Development: Created by MedIT Solutions
  • Merged Components:
    • arcee-ai/Llama-3.1-SuperNova-Lite
    • meta-llama/Llama-3.1-8B-Instruct

Technical Details

The model utilizes the proprietary MedIT-mesh technique for model merging, demonstrating an experimental approach to combining language models. This implementation serves as a proof of concept and testing ground for model fusion methodologies.

Purpose

This model was developed primarily for testing and research purposes, exploring the potential of model merging techniques in language model development. It should be considered an experimental release rather than a production-ready model.

Usage Notes

As this is a test model, it is recommended for research and experimental purposes only. Users should be aware of its experimental nature when considering it for any applications.

Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric Value
Avg. 30.04
IFEval (0-Shot) 78.37
BBH (3-Shot) 32.00
MATH Lvl 5 (4-Shot) 20.02
GPQA (0-shot) 7.83
MuSR (0-shot) 9.64
MMLU-PRO (5-shot) 32.40