mav23 commited on
Commit
380ea1d
1 Parent(s): b9956af

Upload folder using huggingface_hub

Browse files
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ l3.1-niitorm-8b-dpo-t0.0001.Q4_0.gguf filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,226 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: transformers
3
+ tags:
4
+ - merge
5
+ - llama
6
+ - dpo
7
+ base_model:
8
+ - akjindal53244/Llama-3.1-Storm-8B
9
+ - Sao10K/L3.1-8B-Niitama-v1.1
10
+ - v000000/L3.1-Niitorm-8B-t0.0001
11
+ datasets:
12
+ - jondurbin/gutenberg-dpo-v0.1
13
+ model-index:
14
+ - name: L3.1-Niitorm-8B-DPO-t0.0001
15
+ results:
16
+ - task:
17
+ type: text-generation
18
+ name: Text Generation
19
+ dataset:
20
+ name: IFEval (0-Shot)
21
+ type: HuggingFaceH4/ifeval
22
+ args:
23
+ num_few_shot: 0
24
+ metrics:
25
+ - type: inst_level_strict_acc and prompt_level_strict_acc
26
+ value: 76.89
27
+ name: strict accuracy
28
+ source:
29
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=v000000/L3.1-Niitorm-8B-DPO-t0.0001
30
+ name: Open LLM Leaderboard
31
+ - task:
32
+ type: text-generation
33
+ name: Text Generation
34
+ dataset:
35
+ name: BBH (3-Shot)
36
+ type: BBH
37
+ args:
38
+ num_few_shot: 3
39
+ metrics:
40
+ - type: acc_norm
41
+ value: 30.51
42
+ name: normalized accuracy
43
+ source:
44
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=v000000/L3.1-Niitorm-8B-DPO-t0.0001
45
+ name: Open LLM Leaderboard
46
+ - task:
47
+ type: text-generation
48
+ name: Text Generation
49
+ dataset:
50
+ name: MATH Lvl 5 (4-Shot)
51
+ type: hendrycks/competition_math
52
+ args:
53
+ num_few_shot: 4
54
+ metrics:
55
+ - type: exact_match
56
+ value: 14.88
57
+ name: exact match
58
+ source:
59
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=v000000/L3.1-Niitorm-8B-DPO-t0.0001
60
+ name: Open LLM Leaderboard
61
+ - task:
62
+ type: text-generation
63
+ name: Text Generation
64
+ dataset:
65
+ name: GPQA (0-shot)
66
+ type: Idavidrein/gpqa
67
+ args:
68
+ num_few_shot: 0
69
+ metrics:
70
+ - type: acc_norm
71
+ value: 5.93
72
+ name: acc_norm
73
+ source:
74
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=v000000/L3.1-Niitorm-8B-DPO-t0.0001
75
+ name: Open LLM Leaderboard
76
+ - task:
77
+ type: text-generation
78
+ name: Text Generation
79
+ dataset:
80
+ name: MuSR (0-shot)
81
+ type: TAUR-Lab/MuSR
82
+ args:
83
+ num_few_shot: 0
84
+ metrics:
85
+ - type: acc_norm
86
+ value: 7.26
87
+ name: acc_norm
88
+ source:
89
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=v000000/L3.1-Niitorm-8B-DPO-t0.0001
90
+ name: Open LLM Leaderboard
91
+ - task:
92
+ type: text-generation
93
+ name: Text Generation
94
+ dataset:
95
+ name: MMLU-PRO (5-shot)
96
+ type: TIGER-Lab/MMLU-Pro
97
+ config: main
98
+ split: test
99
+ args:
100
+ num_few_shot: 5
101
+ metrics:
102
+ - type: acc
103
+ value: 31.85
104
+ name: accuracy
105
+ source:
106
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=v000000/L3.1-Niitorm-8B-DPO-t0.0001
107
+ name: Open LLM Leaderboard
108
+ ---
109
+
110
+ # Llama-3.1-Niitorm-8B-DPO
111
+
112
+ * *DPO Trained, Llama3.1-8B.*
113
+
114
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/64f74b6e6389380c77562762/QeNjtwolNpxUmpo9NL7VI.png)
115
+
116
+ <b>New: DPO'd Gutenberg Version (full epoch training).</b>
117
+
118
+ RP model, Niitama 1.1 as a base, nearswapped with one of the smartest 3.1 models "Storm", then DPO'd, mostly abliterated.
119
+
120
+ Essentially, it's an improved Niitama 1.1
121
+
122
+ -------------------------------------------------------------------------------
123
+
124
+ *Gutenberg DPO creates more human-like prose/story writing and greately lessen synthetic feeling outputs.*
125
+
126
+ -------------------------------------------------------------------------------
127
+
128
+ # *llama.cpp:*
129
+
130
+ # thank you, mradermacher (GGUF)
131
+
132
+ * [GGUF static](https://huggingface.co/mradermacher/L3.1-Niitorm-8B-DPO-t0.0001-GGUF)
133
+
134
+ * [GGUF Imatrix](https://huggingface.co/mradermacher/L3.1-Niitorm-8B-DPO-t0.0001-i1-GGUF)
135
+
136
+ # thank you, QuantFactory (GGUF)
137
+
138
+ * [GGUF static](https://huggingface.co/QuantFactory/L3.1-Niitorm-8B-DPO-t0.0001-GGUF)
139
+
140
+ # v0 (GGUF)
141
+
142
+ * [GGUF Imatrix](https://huggingface.co/v000000/L3.1-Niitorm-8B-DPO-t0.0001-GGUFs-IMATRIX) *-only q8, q6 k, q5 k s, q4 k s, iq4 x s*
143
+
144
+
145
+ ## Finetune and merge
146
+
147
+ This is a merge and finetune of pre-trained language models.
148
+
149
+ *Resultant merge finetuned* on [jondurbin/gutenberg-dpo-v0.1](https://huggingface.co/datasets/jondurbin/gutenberg-dpo-v0.1) for 1 epoch, 1.5e-5 learning rate, on Nvidia A100.
150
+
151
+ ## Merge Details
152
+ ### Merge Method
153
+
154
+ This model was merged using the <b>NEARSWAP t0.0001</b> merge algorithm.
155
+
156
+ ### Models Merged
157
+
158
+ The following models were included in the merge:
159
+ * Base Model: [Sao10K/L3.1-8B-Niitama-v1.1](https://huggingface.co/Sao10K/L3.1-8B-Niitama-v1.1) + [grimjim/Llama-3-Instruct-abliteration-LoRA-8B](https://huggingface.co/grimjim/Llama-3-Instruct-abliteration-LoRA-8B)
160
+ * [akjindal53244/Llama-3.1-Storm-8B](https://huggingface.co/akjindal53244/Llama-3.1-Storm-8B)
161
+
162
+ ### Configuration
163
+
164
+ The following YAML configuration was used to produce this model:
165
+
166
+ ```yaml
167
+ slices:
168
+ - sources:
169
+ - model: Sao10K/L3.1-8B-Niitama-v1.1+grimjim/Llama-3-Instruct-abliteration-LoRA-8B
170
+ layer_range: [0, 32]
171
+ - model: akjindal53244/Llama-3.1-Storm-8B
172
+ layer_range: [0, 32]
173
+ merge_method: nearswap
174
+ base_model: Sao10K/L3.1-8B-Niitama-v1.1+grimjim/Llama-3-Instruct-abliteration-LoRA-8B
175
+ parameters:
176
+ t:
177
+ - value: 0.0001
178
+ dtype: float16
179
+
180
+ # Then, DPO Finetune
181
+ # [jondurbin/gutenberg-dpo-v0.1](https://huggingface.co/datasets/jondurbin/gutenberg-dpo-v0.1)
182
+
183
+ ```
184
+
185
+ ### DPO Notes
186
+
187
+ *I used a higher learning rate and full dataset when training compared to my "L3.1-Celestial-Stone-2x8B-DPO". This caused lower loss and better adaption to the chosen style.*
188
+
189
+ -------------------------------------------------------------------------------
190
+
191
+ # Prompt Template:
192
+ ```bash
193
+ <|begin_of_text|><|start_header_id|>system<|end_header_id|>
194
+
195
+ {system_prompt}<|eot_id|><|start_header_id|>user<|end_header_id|>
196
+
197
+ {input}<|eot_id|><|start_header_id|>assistant<|end_header_id|>
198
+
199
+ {output}<|eot_id|>
200
+
201
+ ```
202
+
203
+ Credit to Alchemonaut.
204
+
205
+ Credit to Sao10K.
206
+
207
+ Credit to Grimjim.
208
+
209
+ Credit to mlabonne.
210
+
211
+ Credit to jondurbin.
212
+
213
+ Credit to woofwolfy.
214
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
215
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_v000000__L3.1-Niitorm-8B-DPO-t0.0001)
216
+
217
+ | Metric |Value|
218
+ |-------------------|----:|
219
+ |Avg. |27.89|
220
+ |IFEval (0-Shot) |76.89|
221
+ |BBH (3-Shot) |30.51|
222
+ |MATH Lvl 5 (4-Shot)|14.88|
223
+ |GPQA (0-shot) | 5.93|
224
+ |MuSR (0-shot) | 7.26|
225
+ |MMLU-PRO (5-shot) |31.85|
226
+
l3.1-niitorm-8b-dpo-t0.0001.Q4_0.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:577f4a2f81acc40db5df3e511e4f9fae2ea2f1055d77f89d7ae5f7e9b7feceba
3
+ size 4661217056