GGUF quantizations of Lambent/arsenic-nemo-unleashed-12B
Original card
Motive: The gutenberg tunes are lovely but all the chatml variants seem to present many issues for merging, and have broken context later on. Decided to see how it worked to tune directly on Unleashed. eq-bench is about a point and a half lower, which isn't drastic but suggests it might benefit from some additional workings.
In hindsight, there actually is a gutenberg tune mixed into Unleashed, so this intensifies the style a fair degree. Poetry leans a bit archaic. I rather like the impact personally.
As is traditional, she got at least one quirk from DPO. In this case it seems to be sometimes briefly slipping into Arabic while chatting. One of the more charming ones I've seen.
Quality of life improvements in some circumstances:
- Assigned the pad token as the pad token for fine-tuning
- Had Axolotl add the chat template (useful on Runpod maybe?)
Substance: DPO-tuning on a mix of gutenberg-dpo and toxic-dpo, in the hope of getting enough classic human talent and edge to write well with. Some of the most beautiful pigments are the most poisonous.
Testing: Not sure if I'm loading long context right, but it seems functional when I load it with 16k but not at 30k. Might just be doing it wrong.
Sample generated poetry (from the same start and only varied by a low temperature, for why they start the same):
Upon the shore of time,
Where waves of years do beat,
I stood and watched as they climb
The cliffs of eternity.
In every cresting o'er the deep,
A life was borne upon the tide;
Each wave that did break at my feet,
Did bear away some soul that had died.
And yet, amidst all this great throng,
There came to me a voice soft and low:
"O mortal man! What seekest thou here?"
Then answered I: "I seek the path of woe."
"Thou seekest," quoth she, "that which is lost;
For lo! The waves have borne it away.
Seek not then for that which once was thine own,
But rather turn thy face towards the light of day."
And so saying, away did she go,
Leaving me alone upon the shore.
Then turned I my steps unto mine own home,
And left behind me all thoughts of sorrow and of woe.
Upon the canvas of the azure sky,
The sun doth paint his golden chariot's course.
His radiant beams like arrows fly,
And pierce the clouds with light's resplendent force.
The verdant fields in emerald array,
Reflect the glory of his noontide reign;
While every blossom, bathed in warm display,
Exhales its sweet perfume on the plain.
But lo! as evening's purple curtain falls,
The sun doth sink behind the western hill.
His parting rays like lovers' farewell calls,
And gilds the heavens with a rosy thrill.
Then let us praise this daily wondrous sight,
And bless the Maker of the world so bright.
See axolotl config
axolotl version: 0.4.1
base_model: MarinaraSpaghetti/NemoMix-Unleashed-12B
model_type: AutoModelForCausalLM
tokenizer_type: AutoTokenizer
trust_remote_code: true
save_safetensors: true
load_in_8bit: false
load_in_4bit: true
strict: false
special_tokens:
pad_token: <pad>
rl: dpo
# total_num_tokens:
datasets:
- path: jondurbin/gutenberg-dpo-v0.1
split: train
type:
field_system: system
field_prompt: prompt
field_chosen: chosen
field_rejected: rejected
prompt_format: "[INST]{prompt}[/INST]"
chosen_format: "{chosen}"
rejected_format: "{rejected}"
- path: unalignment/toxic-dpo-v0.2
split: train
type:
field_system: system
field_prompt: prompt
field_chosen: chosen
field_rejected: rejected
prompt_format: "[INST]{prompt}[/INST]"
chosen_format: "{chosen}"
rejected_format: "{rejected}"
dataset_prepared_path: prepared-dpo
output_dir: ./dpoq
val_set_size: 0.001
seed: 1
sequence_len: 2048
sample_packing: false
eval_sample_packing: false
pad_to_sequence_len: false
chat_template: inst
adapter: qlora
lora_model_dir:
lora_r: 256
lora_alpha: 256
lora_dropout: 0.05
lora_target_linear: true
lora_fan_in_fan_out:
peft_use_dora: true
wandb_project: unleashed-qlora-dpo
wandb_entity:
wandb_watch:
wandb_name:
wandb_log_model:
gradient_accumulation_steps: 16
micro_batch_size: 1
num_epochs: 1
optimizer: paged_adamw_8bit
lr_scheduler: cosine
learning_rate: 0.00002
cosine_min_lr_ratio: 0.1
cosine_constant_lr_ratio: 0.95
train_on_inputs: false
group_by_length: false
bf16: true
fp16:
tf32: false
gradient_checkpointing: true
early_stopping_patience:
resume_from_checkpoint:
local_rank:
logging_steps: 1
xformers_attention:
flash_attention: true
warmup_steps: 16
evals_per_epoch: 8
saves_per_epoch: 8
save_total_limit: 2
debug:
deepspeed:
weight_decay: 0.001
fsdp:
fsdp_config:
dpoq
This model is a fine-tuned version of MarinaraSpaghetti/NemoMix-Unleashed-12B on the None dataset.
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 2e-05
- train_batch_size: 1
- eval_batch_size: 8
- seed: 42
- gradient_accumulation_steps: 16
- total_train_batch_size: 16
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 16
- training_steps: 92
Training results
Framework versions
- PEFT 0.12.0
- Transformers 4.44.2
- Pytorch 2.3.1+cu121
- Datasets 2.20.0
- Tokenizers 0.19.1
- Downloads last month
- 13
Model tree for SaisExperiments/arsenic-nemo-unleashed-12B-GGUF
Base model
MarinaraSpaghetti/NemoMix-Unleashed-12B