--- license: apache-2.0 language: - en pipeline_tag: text-generation quantized_by: anthracite-org base_model: anthracite-org/magnum-v2-4b tags: - chat --- ## This repo contains GGUF quants of the model. If you need the original weights, please find them [here](https://huggingface.co/anthracite-org/magnum-v2-4b). ## The quants were made with the mentioned PR merged. ![image/png](https://cdn-uploads.huggingface.co/production/uploads/658a46cbfb9c2bdfae75b3a6/9JwXZze4tHRGpc_RzE2AU.png) This is the eighth in a series of models designed to replicate the prose quality of the Claude 3 models, specifically Sonnet and Opus. This model is fine-tuned on top of [IntervitensInc/Llama-3.1-Minitron-4B-Width-Base-chatml](https://huggingface.co/IntervitensInc/Llama-3.1-Minitron-4B-Width-Base-chatml). ## Prompting Model has been Instruct tuned with the ChatML formatting. A typical input would look like this: ```py """<|im_start|>system system prompt<|im_end|> <|im_start|>user Hi there!<|im_end|> <|im_start|>assistant Nice to meet you!<|im_end|> <|im_start|>user Can I ask a question?<|im_end|> <|im_start|>assistant """ ``` ## Support Upstream support has been merged, so these quants work out of the box now!
old instructions before PR To run inference on this model, you'll need to use Aphrodite, vLLM or EXL2/tabbyAPI, as llama.cpp hasn't yet merged the required pull request to fix the llama3.1 rope_freqs issue with custom head dimensions. However, you can work around this by quantizing the model yourself to create a functional GGUF file. Note that until [this PR](https://github.com/ggerganov/llama.cpp/pull/9141) is merged, the context will be limited to 8k tokens. To create a working GGUF file, make the following adjustments: 1. Remove the `"rope_scaling": {}` entry from `config.json` 2. Change `"max_position_embeddings"` to `8192` in `config.json` These modifications should allow you to use the model with llama.cpp, albeit with the mentioned context limitation.

## axolotl config
See axolotl config axolotl version: `0.4.1` ```yaml base_model: IntervitensInc/Llama-3.1-Minitron-4B-Width-Base-chatml model_type: AutoModelForCausalLM tokenizer_type: AutoTokenizer load_in_8bit: false load_in_4bit: false strict: false datasets: - path: anthracite-org/Gryphe-3.5-16k-Subset type: sharegpt conversation: chatml - path: Epiculous/Synthstruct-Gens-v1-Filtered-n-Cleaned type: sharegpt conversation: chatml - path: anthracite-org/Stheno-Data-Filtered type: sharegpt conversation: chatml - path: Epiculous/SynthRP-Gens-v1-Filtered-n-Cleaned type: sharegpt conversation: chatml - path: lodrick-the-lafted/NopmWritingStruct type: sharegpt conversation: chatml - path: anthracite-org/kalo-opus-instruct-22k-no-refusal type: sharegpt conversation: chatml chat_template: chatml val_set_size: 0.01 output_dir: ./outputs/out adapter: lora_r: lora_alpha: lora_dropout: lora_target_linear: sequence_len: 16384 # sequence_len: 32768 sample_packing: true eval_sample_packing: false pad_to_sequence_len: true wandb_project: wandb_entity: wandb_watch: wandb_name: wandb_log_model: gradient_accumulation_steps: 32 micro_batch_size: 1 num_epochs: 2 optimizer: adamw_bnb_8bit lr_scheduler: cosine learning_rate: 0.00002 weight_decay: 0.05 train_on_inputs: false group_by_length: false bf16: auto fp16: tf32: true gradient_checkpointing: true early_stopping_patience: resume_from_checkpoint: local_rank: logging_steps: 1 xformers_attention: flash_attention: true warmup_ratio: 0.1 evals_per_epoch: 4 eval_table_size: eval_max_new_tokens: 128 saves_per_epoch: 1 debug: deepspeed: fsdp: fsdp_config: special_tokens: pad_token: <|finetune_right_pad_id|> ```

## Credits - [anthracite-org/Stheno-Data-Filtered](https://huggingface.co/datasets/anthracite-org/Stheno-Data-Filtered) - [anthracite-org/kalo-opus-instruct-22k-no-refusal](https://huggingface.co/datasets/anthracite-org/kalo-opus-instruct-22k-no-refusal) - [lodrick-the-lafted/NopmWritingStruct](https://huggingface.co/datasets/lodrick-the-lafted/NopmWritingStruct) - [NewEden/Gryphe-3.5-16k-Subset](NewEden/Gryphe-3.5-16k-Subset) - [Epiculous/Synthstruct-Gens-v1.1-Filtered-n-Cleaned](https://huggingface.co/datasets/Epiculous/Synthstruct-Gens-v1.1-Filtered-n-Cleaned) - [Epiculous/SynthRP-Gens-v1.1-Filtered-n-Cleaned](https://huggingface.co/datasets/Epiculous/SynthRP-Gens-v1.1-Filtered-n-Cleaned) This model has been a team effort, and the credits goes to all members of Anthracite. ## Training The training was done for 2 epochs. We used 2 x [RTX 6000s](https://store.nvidia.com/en-us/nvidia-rtx/products/nvidia-rtx-6000-ada-generation/) GPUs graciously provided by [Kubernetes_Bad](https://huggingface.co/kubernetes-bad) for the full-parameter fine-tuning of the model. [Built with Axolotl](https://github.com/OpenAccess-AI-Collective/axolotl) ## Safety ...