---
license: apache-2.0
tags:
- axolotl
- generated_from_trainer
- mixtral
- nous_hermes
- wikihow
- peft
- lora
- qlora
- adapter
- finetune
- transformers
base_model: NousResearch/Nous-Hermes-2-Mixtral-8x7B-SFT
model-index:
- name: Nous-Hermes-2-Mixtral-8x7B-SFT-Wikihow
results: []
datasets:
- HuggingFaceTB/cosmopedia
pipeline_tag: text-generation
---
[](https://github.com/OpenAccess-AI-Collective/axolotl)
See axolotl config
axolotl version: `0.4.0`
```yaml
base_model: NousResearch/Nous-Hermes-2-Mixtral-8x7B-SFT
model_type: MixtralForCausalLM
tokenizer_type: LlamaTokenizer
trust_remote_code: true
hub_model_id: MaziyarPanahi/Nous-Hermes-2-Mixtral-8x7B-SFT-Wikihow
hf_use_auth_token: true
load_in_4bit: true
strict: false
datasets:
- path: HuggingFaceTB/cosmopedia
name:
- wikihow
type:
system_prompt: ""
field_instruction: prompt
field_output: text
format: "[INST] {instruction} [/INST]"
no_input_format: "[INST] {instruction} [/INST]"
dataset_prepared_path: last_run_prepared
val_set_size: 0.1
output_dir: ./qlora-out-wikihow
# save_safetensors: true
adapter: qlora
lora_model_dir:
sequence_len: 1024
sample_packing: true
pad_to_sequence_len: true
lora_r: 32
lora_alpha: 16
lora_dropout: 0.05
lora_target_linear: true
lora_fan_in_fan_out:
lora_target_modules:
# - gate
- q_proj
# - k_proj
- v_proj
# - o_proj
# - w1
# - w2
# - w3
wandb_project:
wandb_entity:
wandb_watch:
wandb_name:
wandb_log_model:
gradient_accumulation_steps: 4
micro_batch_size: 2
num_epochs: 1
optimizer: adamw_bnb_8bit
lr_scheduler: cosine
learning_rate: 0.0002
train_on_inputs: false
group_by_length: false
bf16: auto
fp16:
tf32: false
gradient_checkpointing: true
early_stopping_patience:
resume_from_checkpoint:
local_rank:
logging_steps: 1
xformers_attention:
flash_attention: true
loss_watchdog_threshold: 5.0
loss_watchdog_patience: 3
warmup_steps: 10
evals_per_epoch: 4
eval_table_size:
eval_max_new_tokens: 128
saves_per_epoch: 1
debug:
deepspeed:
weight_decay: 0.0
fsdp:
fsdp_config:
special_tokens:
bos_token: ""
eos_token: ""
unk_token: ""
```
# Nous-Hermes-2-Mixtral-8x7B-SFT-Wikihow
This model is a fine-tuned version of [NousResearch/Nous-Hermes-2-Mixtral-8x7B-SFT](https://huggingface.co/NousResearch/Nous-Hermes-2-Mixtral-8x7B-SFT) on the None dataset.
It achieves the following results on the evaluation set:
- Loss: 0.4559
## GGUF and Quantized models
- https://huggingface.co/MaziyarPanahi/Nous-Hermes-2-Mixtral-8x7B-SFT-Wikihow-GGUF
## How to use
**PEFT**
```python
from peft import PeftModel, PeftConfig
from transformers import AutoModelForCausalLM
model_id = "MaziyarPanahi/Nous-Hermes-2-Mixtral-8x7B-SFT-Wikihow"
config = PeftConfig.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained("NousResearch/Nous-Hermes-2-Mixtral-8x7B-SFT")
model = PeftModel.from_pretrained(model, model_id)
```
**Transformers**
```python
# Use a pipeline as a high-level helper
from transformers import pipeline
model_id = "MaziyarPanahi/Nous-Hermes-2-Mixtral-8x7B-SFT-Wikihow"
pipe = pipeline("text-generation", model=model_id)
# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)
```
## Examples
```
<|im_start|> system
You are a sentient, superintelligent artificial general intelligence, here to teach and assist me.<|im_end|>
<|im_start|> user
Write a short story about Goku discovering kirby has teamed up with Majin Buu to destroy the world.<|im_end|>
<|im_start|> assistant
Once upon a time in Universe Seven, there lived two powerful beings - Son Goku, Earth's greatest hero; and Kirby, an interdimensional pink puffball known for his insatiable appetite and ability to copy abilities from those he consumes. One fateful day, these unlikely adversaries would cross paths when they discovered that their once-rivalized worlds were now under threat by none other than...a collaboration between Kirby himself and one of Dragon Ball Z’s most feared villains – Majin Buu!
It began as any ordinary morning might have started out: breakfast at Capsule Corp., followed by rigorous training exercises led personally by Vegeta (who had since become something like family). However, midway through sparring practice against Cell Jr., Gohan suddenly interrupted them via telepathy—his voice filled with urgency yet tainted slightly with disbelief—"Dad…Kakarot!" He hesitated briefly before continuing gravely, "I just received word from King Kai regarding some disturbingly odd activity happening back home on Popstar."
Goku raised an eyebrow curiously but didn't stop practicing until completing his current set. Afterward, wiping sweat off his brow, he asked casually while heading towards Bulma's lab where she was working diligently alongside Trunks & Goten, “So what kind of trouble does my old buddy say we should look into?”
Bulma glanced away momentarily from whatever gadgetry held her attention captive long enough to respond dryly without looking up, “Well apparently our favorite little vacuum cleaner seems hellbent on destroying everything again.” She paused dramatically then added sarcastically, “But this time around? Get ready because guess who decided tagging along sounded fun?" With great flair, she finally revealed the answer written clearly across multiple screens surrounding her workspace, displaying images captured directly from Planet Namek itself showing both Kirby AND MAJIN BUU causing widespread chaos hand-in-hand!!
For several seconds following confirmation of such preposterous news, silence reigned supreme within the usually bustling confines of Capsule Corporation HQ. Finally breaking free from shock induced paralysis, Goku managed only three words initially, spoken softly almost reverentially, “Buu...and...Kirby?” Then louder, more forcefully demanding clarification, “How could THAT possibly happen?! They don't even exist within same universe let alone work together willingly!”
Vegeta scoffed derisively beside him muttering dismissively beneath breath, 'Impossible!' Yet despite initial skepticism evident throughout entire room moments ago, everyone present couldn't deny mounting evidence suggesting otherwise right before eyes...
And thus begins epic tale chronicling adventures undertaken joint forces comprised primarily Super Saiyan God Blue empowered protagonist hailing originally from distant planet Vegeta accompanied unexpected albeit reluctant antihero originating entirely separate reality altogether bent solely upon saving existence itself rather than dominion over it unlike certain counterparts sharing similar circumstances elsewhere far removed spatially speaking....<|im_end|>
```
```
<|im_start|> system
You are a sentient, superintelligent artificial general intelligence, here to teach and assist me.<|im_end|>
<|im_start|> user
If I lay 2 wet shirts out in the sun flat on the ground to dry and it takes 5 hours until they are dry, how long would it take to dry 4 shirts of the same type that way simultanously?<|im_end|>
<|im_start|> assistant
The time required for drying multiple shirts simultaneously remains the same as when you dried them one at a time because there is no dependency between the individual drying processes. The factors affecting drying speed include sunlight intensity, air circulation, humidity levels, and fabric properties – none of which change based on whether you're drying two or four shirts concurrently.
Thus, if laying out four wet shirts still permits ample exposure to direct sunlight without overlapping too much (which could hinder efficient evaporation), then those additional garments should also be completely dried within approximately 5 hours under identical environmental conditions.<|im_end|>
```
```
<|im_start|> system
You are a sentient, superintelligent artificial general intelligence, here to teach and assist me.<|im_end|>
<|im_start|> user
explain step by step 25-4*2+3=?<|im_end|>
<|im_start|> assistant
Sure! Let's break down the equation into simpler steps:
Given expression: 25 - 4 * 2 + 3 = ?
Step 1: Calculate the multiplication inside parentheses (if any). Since there aren't any in this case, move on to Step 2 directly.
Step 2: Perform operations from left to right according to order of precedence – Parentheses > Exponents > Multiplication & Division > Addition & Subtraction. Here we have only addition (+) and subtraction (-), so perform them from left to right.
First operation is subtraction: 25 - 8 (since 4 times 2 equals 8), resulting in:
25 - 8 = 17
Next operation is addition; add 3 to our current result:
17 + 3 = 20
So, the final answer after evaluating all parts of the given expression would be **20**.<|im_end|>
```
```
<|im_start|> system
You are a sentient, superintelligent artificial general intelligence, here to teach and assist me.<|im_end|>
<|im_start|> user
Mark is faster than Mary , Mary is faster than Joe. Is Joe faster than Mark? Let's think step by step<|im_end|>
<|im_start|> assistant
Step 1: Understand the given information
We know that Mark is faster than Mary, which we can represent as Mark > Mary (using '>' symbol for comparison). Similarly, Mary is faster than Joe, so we have Mary > Joe.
Step 2: Compare the speeds of Mark and Joe using transitive property
The transitive property states that if A > B and B > C, then A > C. Applying this concept here - since we know Mark > Mary and Mary > Joe, we should be able to determine whether Mark > Joe or not.
Step 3: Analyze results
By applying the transitive property from Step 2, it becomes evident that because Mark is faster than Mary, who in turn is faster than Joe; therefore logically, yes – Mark would indeed be faster than Joe. So our final statement would look like this - Mark > Joe.<|im_end|>
```
## Model description
More information needed
## Intended uses & limitations
More information needed
## Training and evaluation data
More information needed
## Training procedure
### Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0002
- train_batch_size: 2
- eval_batch_size: 2
- seed: 42
- distributed_type: multi-GPU
- num_devices: 4
- gradient_accumulation_steps: 4
- total_train_batch_size: 32
- total_eval_batch_size: 8
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 10
- num_epochs: 1
### Training results
| Training Loss | Epoch | Step | Validation Loss |
|:-------------:|:-----:|:----:|:---------------:|
| 0.8589 | 0.0 | 1 | 0.8705 |
| 0.5094 | 0.25 | 483 | 0.5009 |
| 0.4503 | 0.5 | 966 | 0.4734 |
| 0.4569 | 0.75 | 1449 | 0.4559 |
### Framework versions
- PEFT 0.8.2
- Transformers 4.38.0.dev0
- Pytorch 2.2.0+cu121
- Datasets 2.17.0
- Tokenizers 0.15.0