adamo1139 commited on
Commit
88cd72f
1 Parent(s): b305f47

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +46 -7
README.md CHANGED
@@ -6,15 +6,17 @@ license_link: LICENSE
6
 
7
  ## Model description
8
 
9
- Yi-34B model fine-tuned on AEZAKMI v1 dataset that is derived from airoboros 2.2.1 and airoboros 2.2. Finetuned with axolotl, using qlora and nf4 double quant, 1 epoch, batch size 1, lr 0.00007, lr scheduler constant, sequence length 1200. Training took around 33 hours on single local RTX 3090 Ti.
10
- I had power target set to 320W for the GPU, and while I didn't measure power at the wall, it was probably something around 500W. Given the average electricity price in my region, this training run cost me around $3. This was my first attempt at training Yi-34B with this dataset.
11
- Main feature of this model is that it's output should be free of refusals and it feels somehow more natural than airoboros. Prompt format is standard chatml. Don't expect it to be good at math, riddles or be crazy smart. My end goal with AEZAKMI is to create a cozy free chatbot.
 
 
 
12
 
13
- I used 4096 ctx Yi-34B-Llama uploaded by chargoddard as a base for this training.
14
 
15
  ## Prompt Format
16
 
17
- I recommend using ChatML format, as this was used during fine-tune
18
  Here's a prompt format you should use, you can set a different system message, model seems to respect that fine, so it wasn't overfitted.
19
 
20
  ```
@@ -27,15 +29,52 @@ A chat.<|im_end|>
27
 
28
  ## Intended uses & limitations
29
 
30
- Use is limited by Yi license
31
 
32
  ## Known Issues
33
 
34
  I recommend to set repetition penalty to something around 1.05 to avoid repetition. So far I had good experience running this model with temperature 1.2.
35
- Multi-turn conversations could be a bit better, if you ask it to re-write something with some fixes it will have a tendency to just repeat the previous response verbatim without any improvements - this is especially noticeable with repp 1.0
36
  There is still some gptslop left - some responses will have last paragraph with text "Remember that bla bla bla", I will try to get rid of it in the next version of the dataset.
37
  Stories have ChatGPT like paragraph spacing, I will try to introduce a bit more stories that have long paragraphs in the next dataset version.
38
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
39
  ## Upcoming
40
 
41
  I will release adapter files and maybe exllama v2 quant shortly.
 
6
 
7
  ## Model description
8
 
9
+ Yi-34B base model fine-tuned on AEZAKMI v1 dataset. Training took around 33 hours on single local RTX 3090 Ti.
10
+ It's like airoboros but with less gptslop, no refusals and less typical language used by RLHFed OpenAI models.
11
+ Say goodbye to "It's important to remember"! \
12
+ Prompt format is standard chatml. Don't expect it to be good at math, riddles or be crazy smart. My end goal with AEZAKMI is to create a cozy free chatbot.
13
+ Cost of this fine-tune is about $3 in electricity. This was my first attempt at training Yi-34B with this dataset.
14
+ Base model used for fine-tuning was 4k context Yi-34B-Llama model shared by chargoddard.
15
 
 
16
 
17
  ## Prompt Format
18
 
19
+ I recommend using ChatML format, as this was used during fine-tune. \
20
  Here's a prompt format you should use, you can set a different system message, model seems to respect that fine, so it wasn't overfitted.
21
 
22
  ```
 
29
 
30
  ## Intended uses & limitations
31
 
32
+ Use is limited by Yi license.
33
 
34
  ## Known Issues
35
 
36
  I recommend to set repetition penalty to something around 1.05 to avoid repetition. So far I had good experience running this model with temperature 1.2.
37
+ Multi-turn conversations could be a bit better, if you ask it to re-write something with some fixes it will have a tendency to just repeat the previous response verbatim without any improvements - this is especially noticeable with repp 1.0.
38
  There is still some gptslop left - some responses will have last paragraph with text "Remember that bla bla bla", I will try to get rid of it in the next version of the dataset.
39
  Stories have ChatGPT like paragraph spacing, I will try to introduce a bit more stories that have long paragraphs in the next dataset version.
40
 
41
+ ## Axolotl training parameters
42
+
43
+ - bnb_4bit_use_double_quant: true
44
+ - bnb_4bit_compute_dtype: torch.bfloat16
45
+ - is_llama_derived_model: true
46
+ - load_in_4bit: true
47
+ - adapter: qlora
48
+ - sequence_len: 1200
49
+ - sample_packing: false
50
+ - lora_r: 16
51
+ - lora_alpha: 32
52
+ - lora_target_modules:
53
+ - q_proj
54
+ - v_proj
55
+ - k_proj
56
+ - o_proj
57
+ - gate_proj
58
+ - down_proj
59
+ - up_proj
60
+ - lora_target_linear: true
61
+ - pad_to_sequence_len: true
62
+ - micro_batch_size: 1
63
+ - gradient_accumulation_steps: 1
64
+ - num_epochs: 1
65
+ - optimizer: adamw_bnb_8bit
66
+ - lr_scheduler: constant
67
+ - learning_rate: 0.00007
68
+ - train_on_inputs: false
69
+ - group_by_length: false
70
+ - bf16: true
71
+ - bfloat16: true
72
+ - flash_optimum: false
73
+ - gradient_checkpointing: true
74
+ - flash_attention: true
75
+ - seed: 42
76
+
77
+
78
  ## Upcoming
79
 
80
  I will release adapter files and maybe exllama v2 quant shortly.