Edit model card

EXL2 quantisation of NeuralHermes-2.5-Mistral-7B, for use with ExLLamaV2.

Original model by @mlabonne.

Model size: 4.6GB (3x reduction), 5 bits-per-weight average, 6bpw on head.

Calibration Data: Wikitext (parquet)

Command: python convert.py -i convert/NeuralHermes-2.5-Mistral-7B -c convert/0000.parquet -o convert/temp2 -cf convert/nh-5bpw -b 5.0 -hb 6

Layer measurements are provided in `measurement.json`` for further quantisation.


NeuralHermes 2.5 - Mistral 7B

NeuralHermes is an OpenHermes-2.5-Mistral-7B model that has been further fine-tuned with Direct Preference Optimization (DPO) using the mlabonne/chatml_dpo_pairs dataset.

It is directly inspired by the RLHF process described by neural-chat-7b-v3-1's authors to improve performance. I used the same dataset and reformatted it to apply the ChatML template. I haven't performed a comprehensive evaluation of the model, but it works great, nothing broken apparently! :)

The code to train this model is available on Google Colab and GitHub. It required an A100 GPU for about an hour.

GGUF versions of this model are available here: mlabonne/NeuralHermes-2.5-Mistral-7B-GGUF.

Usage

You can run this model using LM Studio or any other frontend.

You can also run this model using the following code:

import transformers
from transformers import AutoTokenizer

# Format prompt
message = [
    {"role": "system", "content": "You are a helpful assistant chatbot."},
    {"role": "user", "content": "What is a Large Language Model?"}
]
tokenizer = AutoTokenizer.from_pretrained(new_model)
prompt = tokenizer.apply_chat_template(message, add_generation_prompt=True, tokenize=False)

# Create pipeline
pipeline = transformers.pipeline(
    "text-generation",
    model=new_model,
    tokenizer=tokenizer
)

# Generate text
sequences = pipeline(
    prompt,
    do_sample=True,
    temperature=0.7,
    top_p=0.9,
    num_return_sequences=1,
    max_length=200,
)
print(sequences[0]['generated_text'])

Training hyperparameters

LoRA:

  • r=16,
  • lora_alpha=16,
  • lora_dropout=0.05,
  • bias="none",
  • task_type="CAUSAL_LM",
  • target_modules=['k_proj', 'gate_proj', 'v_proj', 'up_proj', 'q_proj', 'o_proj', 'down_proj']

Training arguments:

  • per_device_train_batch_size=4,
  • gradient_accumulation_steps=4,
  • gradient_checkpointing=True,
  • learning_rate=5e-5,
  • lr_scheduler_type="cosine",
  • max_steps=200,
  • optim="paged_adamw_32bit",
  • warmup_steps=100,

DPOTrainer:

  • beta=0.1,
  • max_prompt_length=1024,
  • max_length=1536,
Downloads last month
11
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for IconicAI/NeuralHermes-2.5-Mistral-7B-exl2-5bpw

Finetuned
(70)
this model

Dataset used to train IconicAI/NeuralHermes-2.5-Mistral-7B-exl2-5bpw