language:
- en
- fr
- de
- es
- it
- pt
- ru
- zh
- ja
license: other
tags:
- chat
base_model: Qwen/Qwen2-72B-Instruct
datasets:
- Doctor-Shotgun/C2-Stheno
- anthracite-org/kalo-opus-instruct-22k-no-refusal
- anthracite-org/nopm_claude_writing_fixed
license_name: tongyi-qianwen
license_link: https://huggingface.co/anthracite-org/magnum-v2-72b/blob/main/LICENSE
pipeline_tag: text-generation
model-index:
- name: magnum-v2-72b
results:
- task:
type: text-generation
name: Text Generation
dataset:
name: IFEval (0-Shot)
type: HuggingFaceH4/ifeval
args:
num_few_shot: 0
metrics:
- type: inst_level_strict_acc and prompt_level_strict_acc
value: 75.6
name: strict accuracy
source:
url: >-
https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=anthracite-org/magnum-v2-72b
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: BBH (3-Shot)
type: BBH
args:
num_few_shot: 3
metrics:
- type: acc_norm
value: 57.85
name: normalized accuracy
source:
url: >-
https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=anthracite-org/magnum-v2-72b
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: MATH Lvl 5 (4-Shot)
type: hendrycks/competition_math
args:
num_few_shot: 4
metrics:
- type: exact_match
value: 31.65
name: exact match
source:
url: >-
https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=anthracite-org/magnum-v2-72b
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: GPQA (0-shot)
type: Idavidrein/gpqa
args:
num_few_shot: 0
metrics:
- type: acc_norm
value: 18.12
name: acc_norm
source:
url: >-
https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=anthracite-org/magnum-v2-72b
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: MuSR (0-shot)
type: TAUR-Lab/MuSR
args:
num_few_shot: 0
metrics:
- type: acc_norm
value: 14.18
name: acc_norm
source:
url: >-
https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=anthracite-org/magnum-v2-72b
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: MMLU-PRO (5-shot)
type: TIGER-Lab/MMLU-Pro
config: main
split: test
args:
num_few_shot: 5
metrics:
- type: acc
value: 49.51
name: accuracy
source:
url: >-
https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=anthracite-org/magnum-v2-72b
name: Open LLM Leaderboard
MLX Format and Quantizations for Magnum v2 72b
Quantized to 4 bpw precision and tested using the mlx_lm
utility on a 64GiB URAM M1 Max.
Notes on using:
Requires and optimized for Apple Silicon. Fast enough for rapid back-and-forth as long as it fits on your URAM.
I tried to serve this with mlx_lm.serve
per usual, but I got python string indexing errors no matter what I did. It works fine with LM Studio in OpenAI mode.
I used this with SillyTavern, it worked well.
See original model for further details.
Larger, 8bpw quants available at mlx-community.
Original Model card
This is the seventh (Lucky!) in a series of models designed to replicate the prose quality of the Claude 3 models, specifically Sonnet and Opus. This model is fine-tuned on top of Qwen-2 72B Instruct.
Prompting
Model has been Instruct tuned with the ChatML formatting. A typical input would look like this:
"""<|im_start|>user
Hi there!<|im_end|>
<|im_start|>assistant
Nice to meet you!<|im_end|>
<|im_start|>user
Can I ask a question?<|im_end|>
<|im_start|>assistant
"""
Credits
- anthracite-org/Stheno-Data-Filtered
- anthracite-org/kalo-opus-instruct-22k-no-refusal
- anthracite-org/nopm_claude_writing_fixed
This model has been a team effort, and the credits goes to all members of Anthracite.
Training
The training was done for 2 epochs. We used 8x AMD Instinct™ MI300X Accelerators for the full-parameter fine-tuning of the model.
We also trained with a weight decay of 0.01 to help further stabilize the loss trajectory and mitigate catastrophic forgetting, and utilize a peak learning rate of 4e-6 to prevent the 2nd epoch loss from dropping too significantly (as it is a strong indicator of overfitting).
Sample Packing was done for 16k tokens rather than the 8k tokens used in our previous runs.
Safety
...
Open LLM Leaderboard Evaluation Results
Detailed results can be found here
Metric | Value |
---|---|
Avg. | 41.15 |
IFEval (0-Shot) | 75.60 |
BBH (3-Shot) | 57.85 |
MATH Lvl 5 (4-Shot) | 31.65 |
GPQA (0-shot) | 18.12 |
MuSR (0-shot) | 14.18 |
MMLU-PRO (5-shot) | 49.51 |
Open LLM Leaderboard Evaluation Results
Detailed results can be found here
Metric | Value |
---|---|
Avg. | 41.15 |
IFEval (0-Shot) | 75.60 |
BBH (3-Shot) | 57.85 |
MATH Lvl 5 (4-Shot) | 31.65 |
GPQA (0-shot) | 18.12 |
MuSR (0-shot) | 14.18 |
MMLU-PRO (5-shot) | 49.51 |