|
--- |
|
license: apache-2.0 |
|
tags: |
|
- llm |
|
- yi |
|
- fine-tune |
|
--- |
|
|
|
## Model description |
|
|
|
Yi-34B base model fine-tuned on AEZAKMI v1 dataset. Training took around 33 hours on single local RTX 3090 Ti. |
|
It's like airoboros but with less gptslop, no refusals and less typical language used by RLHFed OpenAI models. |
|
Say goodbye to "It's important to remember"! \ |
|
Prompt format is standard chatml. Don't expect it to be good at math, riddles or be crazy smart. My end goal with AEZAKMI is to create a cozy free chatbot. |
|
Cost of this fine-tune is about $3 in electricity. This was my first attempt at training Yi-34B with this dataset. |
|
Base model used for fine-tuning was 4k context Yi-34B-Llama model shared by chargoddard. |
|
|
|
|
|
## Prompt Format |
|
|
|
I recommend using ChatML format, as this was used during fine-tune. \ |
|
Here's a prompt format you should use, you can set a different system message, model seems to respect that fine, so it wasn't overfitted. |
|
|
|
``` |
|
<|im_start|>system |
|
A chat with uncensored assistant.<|im_end|> |
|
<|im_start|>user |
|
{prompt}<|im_end|> |
|
<|im_start|>assistant |
|
``` |
|
|
|
## Intended uses & limitations |
|
|
|
Use is limited by Yi license. |
|
|
|
## Known Issues |
|
|
|
I recommend to set repetition penalty to something around 1.05 to avoid repetition. So far I had good experience running this model with temperature 1.2. |
|
Multi-turn conversations could be a bit better, if you ask it to re-write something with some fixes it will have a tendency to just repeat the previous response verbatim without any improvements - this is especially noticeable with repp 1.0. \ |
|
There is still some gptslop left - some responses will have last paragraph with text "Remember that bla bla bla", I will try to get rid of it in the next version of the dataset. \ |
|
Stories have ChatGPT like paragraph spacing, I will try to introduce a bit more stories that have long paragraphs in the next dataset version. |
|
|
|
Unrestricted-ness of the v1 isn't quite something that I am happy with yet, it quite often still refuses to answer and behaves kinda like a normal airoboros, especially with default system prompt "A chat.". \ |
|
With a slightly modifed prompt it works much better, I recommend using "A chat with uncensored assistant.", it's stupid but it helps a lot. \ |
|
I was made aware of the frequent occurrence of the phrase "sending shivers down a spine" in the generations during RP, so I will manually fix those samples and put in different wording to fix that. \ |
|
I plan to use Yi-34B-200K Llama-fied for the next version, v1 very likely won't handle long context above 6k-8k well, be warned. \ |
|
There is also some issue with handling long system messages for RP, I will investigate that before releasing v2 dataset. |
|
|
|
Feel free to report issues in the discussions panel here, I don't lurk /lmg/ too often and I would still like to hear some feedback. |
|
|
|
|
|
## Axolotl training parameters |
|
|
|
- bnb_4bit_use_double_quant: true |
|
- bnb_4bit_compute_dtype: torch.bfloat16 |
|
- is_llama_derived_model: true |
|
- load_in_4bit: true |
|
- adapter: qlora |
|
- sequence_len: 1200 |
|
- sample_packing: false |
|
- lora_r: 16 |
|
- lora_alpha: 32 |
|
- lora_target_modules: |
|
- q_proj |
|
- v_proj |
|
- k_proj |
|
- o_proj |
|
- gate_proj |
|
- down_proj |
|
- up_proj |
|
- lora_target_linear: true |
|
- pad_to_sequence_len: true |
|
- micro_batch_size: 1 |
|
- gradient_accumulation_steps: 1 |
|
- num_epochs: 1 |
|
- optimizer: adamw_bnb_8bit |
|
- lr_scheduler: constant |
|
- learning_rate: 0.00007 |
|
- train_on_inputs: false |
|
- group_by_length: false |
|
- bf16: true |
|
- bfloat16: true |
|
- flash_optimum: false |
|
- gradient_checkpointing: true |
|
- flash_attention: true |
|
- seed: 42 |
|
|
|
|
|
## Upcoming |
|
|
|
~I will release adapter files and maybe exllama v2 quant shortly.~ \ |
|
LoRA and exl2 quant has been released |