TheDrummer commited on
Commit
ffd28dd
1 Parent(s): 40ce7b5

Upload folder using huggingface_hub

Browse files
README.md ADDED
@@ -0,0 +1,610 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model:
3
+ - unsloth/Mistral-Small-Instruct-2409
4
+ library_name: transformers
5
+ tags:
6
+ - mergekit
7
+ - merge
8
+
9
+ ---
10
+ # merged
11
+
12
+ This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit).
13
+
14
+ ## Merge Details
15
+ ### Merge Method
16
+
17
+ This model was merged using the passthrough merge method.
18
+
19
+ ### Models Merged
20
+
21
+ The following models were included in the merge:
22
+ * [unsloth/Mistral-Small-Instruct-2409](https://huggingface.co/unsloth/Mistral-Small-Instruct-2409)
23
+
24
+ ### Configuration
25
+
26
+ The following YAML configuration was used to produce this model:
27
+
28
+ ```yaml
29
+ merge_method: passthrough
30
+ slices:
31
+ - sources:
32
+ - layer_range: [0, 19]
33
+ model: unsloth/Mistral-Small-Instruct-2409
34
+ # Original L19
35
+ - sources:
36
+ - layer_range: [19, 20]
37
+ model: unsloth/Mistral-Small-Instruct-2409
38
+ # Dupe A of L19
39
+ - sources:
40
+ - layer_range: [19, 20]
41
+ model: unsloth/Mistral-Small-Instruct-2409
42
+ parameters:
43
+ scale:
44
+ - filter: o_proj
45
+ value: 0.0
46
+ - filter: down_proj
47
+ value: 0.0
48
+ - value: 1.0
49
+ # Dupe B of L19
50
+ - sources:
51
+ - layer_range: [19, 20]
52
+ model: unsloth/Mistral-Small-Instruct-2409
53
+ parameters:
54
+ scale:
55
+ - filter: o_proj
56
+ value: 0.0
57
+ - filter: down_proj
58
+ value: 0.0
59
+ - value: 1.0
60
+ # Original L20
61
+ - sources:
62
+ - layer_range: [20, 21]
63
+ model: unsloth/Mistral-Small-Instruct-2409
64
+ # Dupe A of L20
65
+ - sources:
66
+ - layer_range: [20, 21]
67
+ model: unsloth/Mistral-Small-Instruct-2409
68
+ parameters:
69
+ scale:
70
+ - filter: o_proj
71
+ value: 0.0
72
+ - filter: down_proj
73
+ value: 0.0
74
+ - value: 1.0
75
+ # Dupe B of L20
76
+ - sources:
77
+ - layer_range: [20, 21]
78
+ model: unsloth/Mistral-Small-Instruct-2409
79
+ parameters:
80
+ scale:
81
+ - filter: o_proj
82
+ value: 0.0
83
+ - filter: down_proj
84
+ value: 0.0
85
+ - value: 1.0
86
+ # Original L21
87
+ - sources:
88
+ - layer_range: [21, 22]
89
+ model: unsloth/Mistral-Small-Instruct-2409
90
+ # Dupe A of L21
91
+ - sources:
92
+ - layer_range: [21, 22]
93
+ model: unsloth/Mistral-Small-Instruct-2409
94
+ parameters:
95
+ scale:
96
+ - filter: o_proj
97
+ value: 0.0
98
+ - filter: down_proj
99
+ value: 0.0
100
+ - value: 1.0
101
+ # Dupe B of L21
102
+ - sources:
103
+ - layer_range: [21, 22]
104
+ model: unsloth/Mistral-Small-Instruct-2409
105
+ parameters:
106
+ scale:
107
+ - filter: o_proj
108
+ value: 0.0
109
+ - filter: down_proj
110
+ value: 0.0
111
+ - value: 1.0
112
+ # Original L22
113
+ - sources:
114
+ - layer_range: [22, 23]
115
+ model: unsloth/Mistral-Small-Instruct-2409
116
+ # Dupe A of L22
117
+ - sources:
118
+ - layer_range: [22, 23]
119
+ model: unsloth/Mistral-Small-Instruct-2409
120
+ parameters:
121
+ scale:
122
+ - filter: o_proj
123
+ value: 0.0
124
+ - filter: down_proj
125
+ value: 0.0
126
+ - value: 1.0
127
+ # Dupe B of L22
128
+ - sources:
129
+ - layer_range: [22, 23]
130
+ model: unsloth/Mistral-Small-Instruct-2409
131
+ parameters:
132
+ scale:
133
+ - filter: o_proj
134
+ value: 0.0
135
+ - filter: down_proj
136
+ value: 0.0
137
+ - value: 1.0
138
+ # Original L23
139
+ - sources:
140
+ - layer_range: [23, 24]
141
+ model: unsloth/Mistral-Small-Instruct-2409
142
+ # Dupe A of L23
143
+ - sources:
144
+ - layer_range: [23, 24]
145
+ model: unsloth/Mistral-Small-Instruct-2409
146
+ parameters:
147
+ scale:
148
+ - filter: o_proj
149
+ value: 0.0
150
+ - filter: down_proj
151
+ value: 0.0
152
+ - value: 1.0
153
+ # Dupe B of L23
154
+ - sources:
155
+ - layer_range: [23, 24]
156
+ model: unsloth/Mistral-Small-Instruct-2409
157
+ parameters:
158
+ scale:
159
+ - filter: o_proj
160
+ value: 0.0
161
+ - filter: down_proj
162
+ value: 0.0
163
+ - value: 1.0
164
+ # Original L24
165
+ - sources:
166
+ - layer_range: [24, 25]
167
+ model: unsloth/Mistral-Small-Instruct-2409
168
+ # Dupe A of L24
169
+ - sources:
170
+ - layer_range: [24, 25]
171
+ model: unsloth/Mistral-Small-Instruct-2409
172
+ parameters:
173
+ scale:
174
+ - filter: o_proj
175
+ value: 0.0
176
+ - filter: down_proj
177
+ value: 0.0
178
+ - value: 1.0
179
+ # Dupe B of L24
180
+ - sources:
181
+ - layer_range: [24, 25]
182
+ model: unsloth/Mistral-Small-Instruct-2409
183
+ parameters:
184
+ scale:
185
+ - filter: o_proj
186
+ value: 0.0
187
+ - filter: down_proj
188
+ value: 0.0
189
+ - value: 1.0
190
+ # Original L25
191
+ - sources:
192
+ - layer_range: [25, 26]
193
+ model: unsloth/Mistral-Small-Instruct-2409
194
+ # Dupe A of L25
195
+ - sources:
196
+ - layer_range: [25, 26]
197
+ model: unsloth/Mistral-Small-Instruct-2409
198
+ parameters:
199
+ scale:
200
+ - filter: o_proj
201
+ value: 0.0
202
+ - filter: down_proj
203
+ value: 0.0
204
+ - value: 1.0
205
+ # Dupe B of L25
206
+ - sources:
207
+ - layer_range: [25, 26]
208
+ model: unsloth/Mistral-Small-Instruct-2409
209
+ parameters:
210
+ scale:
211
+ - filter: o_proj
212
+ value: 0.0
213
+ - filter: down_proj
214
+ value: 0.0
215
+ - value: 1.0
216
+ # Original L26
217
+ - sources:
218
+ - layer_range: [26, 27]
219
+ model: unsloth/Mistral-Small-Instruct-2409
220
+ # Dupe A of L26
221
+ - sources:
222
+ - layer_range: [26, 27]
223
+ model: unsloth/Mistral-Small-Instruct-2409
224
+ parameters:
225
+ scale:
226
+ - filter: o_proj
227
+ value: 0.0
228
+ - filter: down_proj
229
+ value: 0.0
230
+ - value: 1.0
231
+ # Dupe B of L26
232
+ - sources:
233
+ - layer_range: [26, 27]
234
+ model: unsloth/Mistral-Small-Instruct-2409
235
+ parameters:
236
+ scale:
237
+ - filter: o_proj
238
+ value: 0.0
239
+ - filter: down_proj
240
+ value: 0.0
241
+ - value: 1.0
242
+ # Original L27
243
+ - sources:
244
+ - layer_range: [27, 28]
245
+ model: unsloth/Mistral-Small-Instruct-2409
246
+ # Dupe A of L27
247
+ - sources:
248
+ - layer_range: [27, 28]
249
+ model: unsloth/Mistral-Small-Instruct-2409
250
+ parameters:
251
+ scale:
252
+ - filter: o_proj
253
+ value: 0.0
254
+ - filter: down_proj
255
+ value: 0.0
256
+ - value: 1.0
257
+ # Dupe B of L27
258
+ - sources:
259
+ - layer_range: [27, 28]
260
+ model: unsloth/Mistral-Small-Instruct-2409
261
+ parameters:
262
+ scale:
263
+ - filter: o_proj
264
+ value: 0.0
265
+ - filter: down_proj
266
+ value: 0.0
267
+ - value: 1.0
268
+ # Original L28
269
+ - sources:
270
+ - layer_range: [28, 29]
271
+ model: unsloth/Mistral-Small-Instruct-2409
272
+ # Dupe A of L28
273
+ - sources:
274
+ - layer_range: [28, 29]
275
+ model: unsloth/Mistral-Small-Instruct-2409
276
+ parameters:
277
+ scale:
278
+ - filter: o_proj
279
+ value: 0.0
280
+ - filter: down_proj
281
+ value: 0.0
282
+ - value: 1.0
283
+ # Dupe B of L28
284
+ - sources:
285
+ - layer_range: [28, 29]
286
+ model: unsloth/Mistral-Small-Instruct-2409
287
+ parameters:
288
+ scale:
289
+ - filter: o_proj
290
+ value: 0.0
291
+ - filter: down_proj
292
+ value: 0.0
293
+ - value: 1.0
294
+ # Original L29
295
+ - sources:
296
+ - layer_range: [29, 30]
297
+ model: unsloth/Mistral-Small-Instruct-2409
298
+ # Dupe A of L29
299
+ - sources:
300
+ - layer_range: [29, 30]
301
+ model: unsloth/Mistral-Small-Instruct-2409
302
+ parameters:
303
+ scale:
304
+ - filter: o_proj
305
+ value: 0.0
306
+ - filter: down_proj
307
+ value: 0.0
308
+ - value: 1.0
309
+ # Dupe B of L29
310
+ - sources:
311
+ - layer_range: [29, 30]
312
+ model: unsloth/Mistral-Small-Instruct-2409
313
+ parameters:
314
+ scale:
315
+ - filter: o_proj
316
+ value: 0.0
317
+ - filter: down_proj
318
+ value: 0.0
319
+ - value: 1.0
320
+ # Original L30
321
+ - sources:
322
+ - layer_range: [30, 31]
323
+ model: unsloth/Mistral-Small-Instruct-2409
324
+ # Dupe A of L30
325
+ - sources:
326
+ - layer_range: [30, 31]
327
+ model: unsloth/Mistral-Small-Instruct-2409
328
+ parameters:
329
+ scale:
330
+ - filter: o_proj
331
+ value: 0.0
332
+ - filter: down_proj
333
+ value: 0.0
334
+ - value: 1.0
335
+ # Dupe B of L30
336
+ - sources:
337
+ - layer_range: [30, 31]
338
+ model: unsloth/Mistral-Small-Instruct-2409
339
+ parameters:
340
+ scale:
341
+ - filter: o_proj
342
+ value: 0.0
343
+ - filter: down_proj
344
+ value: 0.0
345
+ - value: 1.0
346
+ # Original L31
347
+ - sources:
348
+ - layer_range: [31, 32]
349
+ model: unsloth/Mistral-Small-Instruct-2409
350
+ # Dupe A of L31
351
+ - sources:
352
+ - layer_range: [31, 32]
353
+ model: unsloth/Mistral-Small-Instruct-2409
354
+ parameters:
355
+ scale:
356
+ - filter: o_proj
357
+ value: 0.0
358
+ - filter: down_proj
359
+ value: 0.0
360
+ - value: 1.0
361
+ # Dupe B of L31
362
+ - sources:
363
+ - layer_range: [31, 32]
364
+ model: unsloth/Mistral-Small-Instruct-2409
365
+ parameters:
366
+ scale:
367
+ - filter: o_proj
368
+ value: 0.0
369
+ - filter: down_proj
370
+ value: 0.0
371
+ - value: 1.0
372
+ # Original L32
373
+ - sources:
374
+ - layer_range: [32, 33]
375
+ model: unsloth/Mistral-Small-Instruct-2409
376
+ # Dupe A of L32
377
+ - sources:
378
+ - layer_range: [32, 33]
379
+ model: unsloth/Mistral-Small-Instruct-2409
380
+ parameters:
381
+ scale:
382
+ - filter: o_proj
383
+ value: 0.0
384
+ - filter: down_proj
385
+ value: 0.0
386
+ - value: 1.0
387
+ # Dupe B of L32
388
+ - sources:
389
+ - layer_range: [32, 33]
390
+ model: unsloth/Mistral-Small-Instruct-2409
391
+ parameters:
392
+ scale:
393
+ - filter: o_proj
394
+ value: 0.0
395
+ - filter: down_proj
396
+ value: 0.0
397
+ - value: 1.0
398
+ # Original L33
399
+ - sources:
400
+ - layer_range: [33, 34]
401
+ model: unsloth/Mistral-Small-Instruct-2409
402
+ # Dupe A of L33
403
+ - sources:
404
+ - layer_range: [33, 34]
405
+ model: unsloth/Mistral-Small-Instruct-2409
406
+ parameters:
407
+ scale:
408
+ - filter: o_proj
409
+ value: 0.0
410
+ - filter: down_proj
411
+ value: 0.0
412
+ - value: 1.0
413
+ # Dupe B of L33
414
+ - sources:
415
+ - layer_range: [33, 34]
416
+ model: unsloth/Mistral-Small-Instruct-2409
417
+ parameters:
418
+ scale:
419
+ - filter: o_proj
420
+ value: 0.0
421
+ - filter: down_proj
422
+ value: 0.0
423
+ - value: 1.0
424
+ # Original L34
425
+ - sources:
426
+ - layer_range: [34, 35]
427
+ model: unsloth/Mistral-Small-Instruct-2409
428
+ # Dupe A of L34
429
+ - sources:
430
+ - layer_range: [34, 35]
431
+ model: unsloth/Mistral-Small-Instruct-2409
432
+ parameters:
433
+ scale:
434
+ - filter: o_proj
435
+ value: 0.0
436
+ - filter: down_proj
437
+ value: 0.0
438
+ - value: 1.0
439
+ # Dupe B of L34
440
+ - sources:
441
+ - layer_range: [34, 35]
442
+ model: unsloth/Mistral-Small-Instruct-2409
443
+ parameters:
444
+ scale:
445
+ - filter: o_proj
446
+ value: 0.0
447
+ - filter: down_proj
448
+ value: 0.0
449
+ - value: 1.0
450
+ # Original L35
451
+ - sources:
452
+ - layer_range: [35, 36]
453
+ model: unsloth/Mistral-Small-Instruct-2409
454
+ # Dupe A of L35
455
+ - sources:
456
+ - layer_range: [35, 36]
457
+ model: unsloth/Mistral-Small-Instruct-2409
458
+ parameters:
459
+ scale:
460
+ - filter: o_proj
461
+ value: 0.0
462
+ - filter: down_proj
463
+ value: 0.0
464
+ - value: 1.0
465
+ # Dupe B of L35
466
+ - sources:
467
+ - layer_range: [35, 36]
468
+ model: unsloth/Mistral-Small-Instruct-2409
469
+ parameters:
470
+ scale:
471
+ - filter: o_proj
472
+ value: 0.0
473
+ - filter: down_proj
474
+ value: 0.0
475
+ - value: 1.0
476
+ # Original L36
477
+ - sources:
478
+ - layer_range: [36, 37]
479
+ model: unsloth/Mistral-Small-Instruct-2409
480
+ # Dupe A of L36
481
+ - sources:
482
+ - layer_range: [36, 37]
483
+ model: unsloth/Mistral-Small-Instruct-2409
484
+ parameters:
485
+ scale:
486
+ - filter: o_proj
487
+ value: 0.0
488
+ - filter: down_proj
489
+ value: 0.0
490
+ - value: 1.0
491
+ # Dupe B of L36
492
+ - sources:
493
+ - layer_range: [36, 37]
494
+ model: unsloth/Mistral-Small-Instruct-2409
495
+ parameters:
496
+ scale:
497
+ - filter: o_proj
498
+ value: 0.0
499
+ - filter: down_proj
500
+ value: 0.0
501
+ - value: 1.0
502
+ # Original L37
503
+ - sources:
504
+ - layer_range: [37, 38]
505
+ model: unsloth/Mistral-Small-Instruct-2409
506
+ # Dupe A of L37
507
+ - sources:
508
+ - layer_range: [37, 38]
509
+ model: unsloth/Mistral-Small-Instruct-2409
510
+ parameters:
511
+ scale:
512
+ - filter: o_proj
513
+ value: 0.0
514
+ - filter: down_proj
515
+ value: 0.0
516
+ - value: 1.0
517
+ # Dupe B of L37
518
+ - sources:
519
+ - layer_range: [37, 38]
520
+ model: unsloth/Mistral-Small-Instruct-2409
521
+ parameters:
522
+ scale:
523
+ - filter: o_proj
524
+ value: 0.0
525
+ - filter: down_proj
526
+ value: 0.0
527
+ - value: 1.0
528
+ # Original L38
529
+ - sources:
530
+ - layer_range: [38, 39]
531
+ model: unsloth/Mistral-Small-Instruct-2409
532
+ # Dupe A of L38
533
+ - sources:
534
+ - layer_range: [38, 39]
535
+ model: unsloth/Mistral-Small-Instruct-2409
536
+ parameters:
537
+ scale:
538
+ - filter: o_proj
539
+ value: 0.0
540
+ - filter: down_proj
541
+ value: 0.0
542
+ - value: 1.0
543
+ # Dupe B of L38
544
+ - sources:
545
+ - layer_range: [38, 39]
546
+ model: unsloth/Mistral-Small-Instruct-2409
547
+ parameters:
548
+ scale:
549
+ - filter: o_proj
550
+ value: 0.0
551
+ - filter: down_proj
552
+ value: 0.0
553
+ - value: 1.0
554
+ # Original L39
555
+ - sources:
556
+ - layer_range: [39, 40]
557
+ model: unsloth/Mistral-Small-Instruct-2409
558
+ # Dupe A of L39
559
+ - sources:
560
+ - layer_range: [39, 40]
561
+ model: unsloth/Mistral-Small-Instruct-2409
562
+ parameters:
563
+ scale:
564
+ - filter: o_proj
565
+ value: 0.0
566
+ - filter: down_proj
567
+ value: 0.0
568
+ - value: 1.0
569
+ # Dupe B of L39
570
+ - sources:
571
+ - layer_range: [39, 40]
572
+ model: unsloth/Mistral-Small-Instruct-2409
573
+ parameters:
574
+ scale:
575
+ - filter: o_proj
576
+ value: 0.0
577
+ - filter: down_proj
578
+ value: 0.0
579
+ - value: 1.0
580
+ # Original L40
581
+ - sources:
582
+ - layer_range: [40, 41]
583
+ model: unsloth/Mistral-Small-Instruct-2409
584
+ # Dupe A of L40
585
+ - sources:
586
+ - layer_range: [40, 41]
587
+ model: unsloth/Mistral-Small-Instruct-2409
588
+ parameters:
589
+ scale:
590
+ - filter: o_proj
591
+ value: 0.0
592
+ - filter: down_proj
593
+ value: 0.0
594
+ - value: 1.0
595
+ # Dupe B of L40
596
+ - sources:
597
+ - layer_range: [40, 41]
598
+ model: unsloth/Mistral-Small-Instruct-2409
599
+ parameters:
600
+ scale:
601
+ - filter: o_proj
602
+ value: 0.0
603
+ - filter: down_proj
604
+ value: 0.0
605
+ - value: 1.0
606
+ # ... REPEAT UNTIL 41
607
+ - sources:
608
+ - layer_range: [41, 55]
609
+ model: unsloth/Mistral-Small-Instruct-2409
610
+ ```
config.json ADDED
@@ -0,0 +1,27 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "unsloth/Mistral-Small-Instruct-2409",
3
+ "architectures": [
4
+ "MistralForCausalLM"
5
+ ],
6
+ "attention_dropout": 0.0,
7
+ "bos_token_id": 1,
8
+ "eos_token_id": 2,
9
+ "head_dim": 128,
10
+ "hidden_act": "silu",
11
+ "hidden_size": 6144,
12
+ "initializer_range": 0.02,
13
+ "intermediate_size": 16384,
14
+ "max_position_embeddings": 131072,
15
+ "model_type": "mistral",
16
+ "num_attention_heads": 48,
17
+ "num_hidden_layers": 99,
18
+ "num_key_value_heads": 8,
19
+ "rms_norm_eps": 1e-05,
20
+ "rope_theta": 1000000.0,
21
+ "sliding_window": null,
22
+ "tie_word_embeddings": false,
23
+ "torch_dtype": "bfloat16",
24
+ "transformers_version": "4.46.3",
25
+ "use_cache": true,
26
+ "vocab_size": 32768
27
+ }
mergekit_config.yml ADDED
@@ -0,0 +1,581 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ merge_method: passthrough
2
+ slices:
3
+ - sources:
4
+ - layer_range: [0, 19]
5
+ model: unsloth/Mistral-Small-Instruct-2409
6
+ # Original L19
7
+ - sources:
8
+ - layer_range: [19, 20]
9
+ model: unsloth/Mistral-Small-Instruct-2409
10
+ # Dupe A of L19
11
+ - sources:
12
+ - layer_range: [19, 20]
13
+ model: unsloth/Mistral-Small-Instruct-2409
14
+ parameters:
15
+ scale:
16
+ - filter: o_proj
17
+ value: 0.0
18
+ - filter: down_proj
19
+ value: 0.0
20
+ - value: 1.0
21
+ # Dupe B of L19
22
+ - sources:
23
+ - layer_range: [19, 20]
24
+ model: unsloth/Mistral-Small-Instruct-2409
25
+ parameters:
26
+ scale:
27
+ - filter: o_proj
28
+ value: 0.0
29
+ - filter: down_proj
30
+ value: 0.0
31
+ - value: 1.0
32
+ # Original L20
33
+ - sources:
34
+ - layer_range: [20, 21]
35
+ model: unsloth/Mistral-Small-Instruct-2409
36
+ # Dupe A of L20
37
+ - sources:
38
+ - layer_range: [20, 21]
39
+ model: unsloth/Mistral-Small-Instruct-2409
40
+ parameters:
41
+ scale:
42
+ - filter: o_proj
43
+ value: 0.0
44
+ - filter: down_proj
45
+ value: 0.0
46
+ - value: 1.0
47
+ # Dupe B of L20
48
+ - sources:
49
+ - layer_range: [20, 21]
50
+ model: unsloth/Mistral-Small-Instruct-2409
51
+ parameters:
52
+ scale:
53
+ - filter: o_proj
54
+ value: 0.0
55
+ - filter: down_proj
56
+ value: 0.0
57
+ - value: 1.0
58
+ # Original L21
59
+ - sources:
60
+ - layer_range: [21, 22]
61
+ model: unsloth/Mistral-Small-Instruct-2409
62
+ # Dupe A of L21
63
+ - sources:
64
+ - layer_range: [21, 22]
65
+ model: unsloth/Mistral-Small-Instruct-2409
66
+ parameters:
67
+ scale:
68
+ - filter: o_proj
69
+ value: 0.0
70
+ - filter: down_proj
71
+ value: 0.0
72
+ - value: 1.0
73
+ # Dupe B of L21
74
+ - sources:
75
+ - layer_range: [21, 22]
76
+ model: unsloth/Mistral-Small-Instruct-2409
77
+ parameters:
78
+ scale:
79
+ - filter: o_proj
80
+ value: 0.0
81
+ - filter: down_proj
82
+ value: 0.0
83
+ - value: 1.0
84
+ # Original L22
85
+ - sources:
86
+ - layer_range: [22, 23]
87
+ model: unsloth/Mistral-Small-Instruct-2409
88
+ # Dupe A of L22
89
+ - sources:
90
+ - layer_range: [22, 23]
91
+ model: unsloth/Mistral-Small-Instruct-2409
92
+ parameters:
93
+ scale:
94
+ - filter: o_proj
95
+ value: 0.0
96
+ - filter: down_proj
97
+ value: 0.0
98
+ - value: 1.0
99
+ # Dupe B of L22
100
+ - sources:
101
+ - layer_range: [22, 23]
102
+ model: unsloth/Mistral-Small-Instruct-2409
103
+ parameters:
104
+ scale:
105
+ - filter: o_proj
106
+ value: 0.0
107
+ - filter: down_proj
108
+ value: 0.0
109
+ - value: 1.0
110
+ # Original L23
111
+ - sources:
112
+ - layer_range: [23, 24]
113
+ model: unsloth/Mistral-Small-Instruct-2409
114
+ # Dupe A of L23
115
+ - sources:
116
+ - layer_range: [23, 24]
117
+ model: unsloth/Mistral-Small-Instruct-2409
118
+ parameters:
119
+ scale:
120
+ - filter: o_proj
121
+ value: 0.0
122
+ - filter: down_proj
123
+ value: 0.0
124
+ - value: 1.0
125
+ # Dupe B of L23
126
+ - sources:
127
+ - layer_range: [23, 24]
128
+ model: unsloth/Mistral-Small-Instruct-2409
129
+ parameters:
130
+ scale:
131
+ - filter: o_proj
132
+ value: 0.0
133
+ - filter: down_proj
134
+ value: 0.0
135
+ - value: 1.0
136
+ # Original L24
137
+ - sources:
138
+ - layer_range: [24, 25]
139
+ model: unsloth/Mistral-Small-Instruct-2409
140
+ # Dupe A of L24
141
+ - sources:
142
+ - layer_range: [24, 25]
143
+ model: unsloth/Mistral-Small-Instruct-2409
144
+ parameters:
145
+ scale:
146
+ - filter: o_proj
147
+ value: 0.0
148
+ - filter: down_proj
149
+ value: 0.0
150
+ - value: 1.0
151
+ # Dupe B of L24
152
+ - sources:
153
+ - layer_range: [24, 25]
154
+ model: unsloth/Mistral-Small-Instruct-2409
155
+ parameters:
156
+ scale:
157
+ - filter: o_proj
158
+ value: 0.0
159
+ - filter: down_proj
160
+ value: 0.0
161
+ - value: 1.0
162
+ # Original L25
163
+ - sources:
164
+ - layer_range: [25, 26]
165
+ model: unsloth/Mistral-Small-Instruct-2409
166
+ # Dupe A of L25
167
+ - sources:
168
+ - layer_range: [25, 26]
169
+ model: unsloth/Mistral-Small-Instruct-2409
170
+ parameters:
171
+ scale:
172
+ - filter: o_proj
173
+ value: 0.0
174
+ - filter: down_proj
175
+ value: 0.0
176
+ - value: 1.0
177
+ # Dupe B of L25
178
+ - sources:
179
+ - layer_range: [25, 26]
180
+ model: unsloth/Mistral-Small-Instruct-2409
181
+ parameters:
182
+ scale:
183
+ - filter: o_proj
184
+ value: 0.0
185
+ - filter: down_proj
186
+ value: 0.0
187
+ - value: 1.0
188
+ # Original L26
189
+ - sources:
190
+ - layer_range: [26, 27]
191
+ model: unsloth/Mistral-Small-Instruct-2409
192
+ # Dupe A of L26
193
+ - sources:
194
+ - layer_range: [26, 27]
195
+ model: unsloth/Mistral-Small-Instruct-2409
196
+ parameters:
197
+ scale:
198
+ - filter: o_proj
199
+ value: 0.0
200
+ - filter: down_proj
201
+ value: 0.0
202
+ - value: 1.0
203
+ # Dupe B of L26
204
+ - sources:
205
+ - layer_range: [26, 27]
206
+ model: unsloth/Mistral-Small-Instruct-2409
207
+ parameters:
208
+ scale:
209
+ - filter: o_proj
210
+ value: 0.0
211
+ - filter: down_proj
212
+ value: 0.0
213
+ - value: 1.0
214
+ # Original L27
215
+ - sources:
216
+ - layer_range: [27, 28]
217
+ model: unsloth/Mistral-Small-Instruct-2409
218
+ # Dupe A of L27
219
+ - sources:
220
+ - layer_range: [27, 28]
221
+ model: unsloth/Mistral-Small-Instruct-2409
222
+ parameters:
223
+ scale:
224
+ - filter: o_proj
225
+ value: 0.0
226
+ - filter: down_proj
227
+ value: 0.0
228
+ - value: 1.0
229
+ # Dupe B of L27
230
+ - sources:
231
+ - layer_range: [27, 28]
232
+ model: unsloth/Mistral-Small-Instruct-2409
233
+ parameters:
234
+ scale:
235
+ - filter: o_proj
236
+ value: 0.0
237
+ - filter: down_proj
238
+ value: 0.0
239
+ - value: 1.0
240
+ # Original L28
241
+ - sources:
242
+ - layer_range: [28, 29]
243
+ model: unsloth/Mistral-Small-Instruct-2409
244
+ # Dupe A of L28
245
+ - sources:
246
+ - layer_range: [28, 29]
247
+ model: unsloth/Mistral-Small-Instruct-2409
248
+ parameters:
249
+ scale:
250
+ - filter: o_proj
251
+ value: 0.0
252
+ - filter: down_proj
253
+ value: 0.0
254
+ - value: 1.0
255
+ # Dupe B of L28
256
+ - sources:
257
+ - layer_range: [28, 29]
258
+ model: unsloth/Mistral-Small-Instruct-2409
259
+ parameters:
260
+ scale:
261
+ - filter: o_proj
262
+ value: 0.0
263
+ - filter: down_proj
264
+ value: 0.0
265
+ - value: 1.0
266
+ # Original L29
267
+ - sources:
268
+ - layer_range: [29, 30]
269
+ model: unsloth/Mistral-Small-Instruct-2409
270
+ # Dupe A of L29
271
+ - sources:
272
+ - layer_range: [29, 30]
273
+ model: unsloth/Mistral-Small-Instruct-2409
274
+ parameters:
275
+ scale:
276
+ - filter: o_proj
277
+ value: 0.0
278
+ - filter: down_proj
279
+ value: 0.0
280
+ - value: 1.0
281
+ # Dupe B of L29
282
+ - sources:
283
+ - layer_range: [29, 30]
284
+ model: unsloth/Mistral-Small-Instruct-2409
285
+ parameters:
286
+ scale:
287
+ - filter: o_proj
288
+ value: 0.0
289
+ - filter: down_proj
290
+ value: 0.0
291
+ - value: 1.0
292
+ # Original L30
293
+ - sources:
294
+ - layer_range: [30, 31]
295
+ model: unsloth/Mistral-Small-Instruct-2409
296
+ # Dupe A of L30
297
+ - sources:
298
+ - layer_range: [30, 31]
299
+ model: unsloth/Mistral-Small-Instruct-2409
300
+ parameters:
301
+ scale:
302
+ - filter: o_proj
303
+ value: 0.0
304
+ - filter: down_proj
305
+ value: 0.0
306
+ - value: 1.0
307
+ # Dupe B of L30
308
+ - sources:
309
+ - layer_range: [30, 31]
310
+ model: unsloth/Mistral-Small-Instruct-2409
311
+ parameters:
312
+ scale:
313
+ - filter: o_proj
314
+ value: 0.0
315
+ - filter: down_proj
316
+ value: 0.0
317
+ - value: 1.0
318
+ # Original L31
319
+ - sources:
320
+ - layer_range: [31, 32]
321
+ model: unsloth/Mistral-Small-Instruct-2409
322
+ # Dupe A of L31
323
+ - sources:
324
+ - layer_range: [31, 32]
325
+ model: unsloth/Mistral-Small-Instruct-2409
326
+ parameters:
327
+ scale:
328
+ - filter: o_proj
329
+ value: 0.0
330
+ - filter: down_proj
331
+ value: 0.0
332
+ - value: 1.0
333
+ # Dupe B of L31
334
+ - sources:
335
+ - layer_range: [31, 32]
336
+ model: unsloth/Mistral-Small-Instruct-2409
337
+ parameters:
338
+ scale:
339
+ - filter: o_proj
340
+ value: 0.0
341
+ - filter: down_proj
342
+ value: 0.0
343
+ - value: 1.0
344
+ # Original L32
345
+ - sources:
346
+ - layer_range: [32, 33]
347
+ model: unsloth/Mistral-Small-Instruct-2409
348
+ # Dupe A of L32
349
+ - sources:
350
+ - layer_range: [32, 33]
351
+ model: unsloth/Mistral-Small-Instruct-2409
352
+ parameters:
353
+ scale:
354
+ - filter: o_proj
355
+ value: 0.0
356
+ - filter: down_proj
357
+ value: 0.0
358
+ - value: 1.0
359
+ # Dupe B of L32
360
+ - sources:
361
+ - layer_range: [32, 33]
362
+ model: unsloth/Mistral-Small-Instruct-2409
363
+ parameters:
364
+ scale:
365
+ - filter: o_proj
366
+ value: 0.0
367
+ - filter: down_proj
368
+ value: 0.0
369
+ - value: 1.0
370
+ # Original L33
371
+ - sources:
372
+ - layer_range: [33, 34]
373
+ model: unsloth/Mistral-Small-Instruct-2409
374
+ # Dupe A of L33
375
+ - sources:
376
+ - layer_range: [33, 34]
377
+ model: unsloth/Mistral-Small-Instruct-2409
378
+ parameters:
379
+ scale:
380
+ - filter: o_proj
381
+ value: 0.0
382
+ - filter: down_proj
383
+ value: 0.0
384
+ - value: 1.0
385
+ # Dupe B of L33
386
+ - sources:
387
+ - layer_range: [33, 34]
388
+ model: unsloth/Mistral-Small-Instruct-2409
389
+ parameters:
390
+ scale:
391
+ - filter: o_proj
392
+ value: 0.0
393
+ - filter: down_proj
394
+ value: 0.0
395
+ - value: 1.0
396
+ # Original L34
397
+ - sources:
398
+ - layer_range: [34, 35]
399
+ model: unsloth/Mistral-Small-Instruct-2409
400
+ # Dupe A of L34
401
+ - sources:
402
+ - layer_range: [34, 35]
403
+ model: unsloth/Mistral-Small-Instruct-2409
404
+ parameters:
405
+ scale:
406
+ - filter: o_proj
407
+ value: 0.0
408
+ - filter: down_proj
409
+ value: 0.0
410
+ - value: 1.0
411
+ # Dupe B of L34
412
+ - sources:
413
+ - layer_range: [34, 35]
414
+ model: unsloth/Mistral-Small-Instruct-2409
415
+ parameters:
416
+ scale:
417
+ - filter: o_proj
418
+ value: 0.0
419
+ - filter: down_proj
420
+ value: 0.0
421
+ - value: 1.0
422
+ # Original L35
423
+ - sources:
424
+ - layer_range: [35, 36]
425
+ model: unsloth/Mistral-Small-Instruct-2409
426
+ # Dupe A of L35
427
+ - sources:
428
+ - layer_range: [35, 36]
429
+ model: unsloth/Mistral-Small-Instruct-2409
430
+ parameters:
431
+ scale:
432
+ - filter: o_proj
433
+ value: 0.0
434
+ - filter: down_proj
435
+ value: 0.0
436
+ - value: 1.0
437
+ # Dupe B of L35
438
+ - sources:
439
+ - layer_range: [35, 36]
440
+ model: unsloth/Mistral-Small-Instruct-2409
441
+ parameters:
442
+ scale:
443
+ - filter: o_proj
444
+ value: 0.0
445
+ - filter: down_proj
446
+ value: 0.0
447
+ - value: 1.0
448
+ # Original L36
449
+ - sources:
450
+ - layer_range: [36, 37]
451
+ model: unsloth/Mistral-Small-Instruct-2409
452
+ # Dupe A of L36
453
+ - sources:
454
+ - layer_range: [36, 37]
455
+ model: unsloth/Mistral-Small-Instruct-2409
456
+ parameters:
457
+ scale:
458
+ - filter: o_proj
459
+ value: 0.0
460
+ - filter: down_proj
461
+ value: 0.0
462
+ - value: 1.0
463
+ # Dupe B of L36
464
+ - sources:
465
+ - layer_range: [36, 37]
466
+ model: unsloth/Mistral-Small-Instruct-2409
467
+ parameters:
468
+ scale:
469
+ - filter: o_proj
470
+ value: 0.0
471
+ - filter: down_proj
472
+ value: 0.0
473
+ - value: 1.0
474
+ # Original L37
475
+ - sources:
476
+ - layer_range: [37, 38]
477
+ model: unsloth/Mistral-Small-Instruct-2409
478
+ # Dupe A of L37
479
+ - sources:
480
+ - layer_range: [37, 38]
481
+ model: unsloth/Mistral-Small-Instruct-2409
482
+ parameters:
483
+ scale:
484
+ - filter: o_proj
485
+ value: 0.0
486
+ - filter: down_proj
487
+ value: 0.0
488
+ - value: 1.0
489
+ # Dupe B of L37
490
+ - sources:
491
+ - layer_range: [37, 38]
492
+ model: unsloth/Mistral-Small-Instruct-2409
493
+ parameters:
494
+ scale:
495
+ - filter: o_proj
496
+ value: 0.0
497
+ - filter: down_proj
498
+ value: 0.0
499
+ - value: 1.0
500
+ # Original L38
501
+ - sources:
502
+ - layer_range: [38, 39]
503
+ model: unsloth/Mistral-Small-Instruct-2409
504
+ # Dupe A of L38
505
+ - sources:
506
+ - layer_range: [38, 39]
507
+ model: unsloth/Mistral-Small-Instruct-2409
508
+ parameters:
509
+ scale:
510
+ - filter: o_proj
511
+ value: 0.0
512
+ - filter: down_proj
513
+ value: 0.0
514
+ - value: 1.0
515
+ # Dupe B of L38
516
+ - sources:
517
+ - layer_range: [38, 39]
518
+ model: unsloth/Mistral-Small-Instruct-2409
519
+ parameters:
520
+ scale:
521
+ - filter: o_proj
522
+ value: 0.0
523
+ - filter: down_proj
524
+ value: 0.0
525
+ - value: 1.0
526
+ # Original L39
527
+ - sources:
528
+ - layer_range: [39, 40]
529
+ model: unsloth/Mistral-Small-Instruct-2409
530
+ # Dupe A of L39
531
+ - sources:
532
+ - layer_range: [39, 40]
533
+ model: unsloth/Mistral-Small-Instruct-2409
534
+ parameters:
535
+ scale:
536
+ - filter: o_proj
537
+ value: 0.0
538
+ - filter: down_proj
539
+ value: 0.0
540
+ - value: 1.0
541
+ # Dupe B of L39
542
+ - sources:
543
+ - layer_range: [39, 40]
544
+ model: unsloth/Mistral-Small-Instruct-2409
545
+ parameters:
546
+ scale:
547
+ - filter: o_proj
548
+ value: 0.0
549
+ - filter: down_proj
550
+ value: 0.0
551
+ - value: 1.0
552
+ # Original L40
553
+ - sources:
554
+ - layer_range: [40, 41]
555
+ model: unsloth/Mistral-Small-Instruct-2409
556
+ # Dupe A of L40
557
+ - sources:
558
+ - layer_range: [40, 41]
559
+ model: unsloth/Mistral-Small-Instruct-2409
560
+ parameters:
561
+ scale:
562
+ - filter: o_proj
563
+ value: 0.0
564
+ - filter: down_proj
565
+ value: 0.0
566
+ - value: 1.0
567
+ # Dupe B of L40
568
+ - sources:
569
+ - layer_range: [40, 41]
570
+ model: unsloth/Mistral-Small-Instruct-2409
571
+ parameters:
572
+ scale:
573
+ - filter: o_proj
574
+ value: 0.0
575
+ - filter: down_proj
576
+ value: 0.0
577
+ - value: 1.0
578
+ # ... REPEAT UNTIL 41
579
+ - sources:
580
+ - layer_range: [41, 55]
581
+ model: unsloth/Mistral-Small-Instruct-2409
model-00001-of-00016.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6ff50d62ecaf4826b998e558aee7a30881d340a018dd012ceddc5dc7d558a1fe
3
+ size 4907476544
model-00002-of-00016.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1f24f206c20330a64f9451786488fc7412ee92000b9a402f2e34a80e2b88d8cc
3
+ size 4882348552
model-00003-of-00016.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5f51306f7ece3194c0c4f473f964bc9f8d7c9b52bcc324ca639992155ed25829
3
+ size 4945225904
model-00004-of-00016.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:16d5a621c278d0adafe135486dfe6329232fbbcf782866df12673af7f7b3f9ba
3
+ size 4995607616
model-00005-of-00016.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2deab57e1f3cf20f24712853952d72e85d52737d81168b84d092a0ec90b90d2c
3
+ size 4882323752
model-00006-of-00016.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:00cb4c82c04f97dd5a88af0ade1045fc74c80672ee89a8b43864ba63d7c64e67
3
+ size 4882323752
model-00007-of-00016.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f18b89d8b300ab395e679e9083d70d43292c8498ff0cc721c98be823f099e3ea
3
+ size 4882323744
model-00008-of-00016.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:99f65f6fcb8aa5ac781ab4a995d66f72beb7748b16f4114a3e3b1448f066eb1b
3
+ size 4857183072
model-00009-of-00016.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5ea6a85f7ab7639d8fbd11c87a105bab00a69c22229ad231f1c0401ca618d62f
3
+ size 4882323752
model-00010-of-00016.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5f7ac515df8dde66571822476a7b37386a0d47735d183c177d08e1be95479a3f
3
+ size 4882323752
model-00011-of-00016.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4319980d0eea13197a43a1c1361974dd651e7b979044ff68518cec22afedc81b
3
+ size 4882323744
model-00012-of-00016.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:445788f0a5835159e56f0c58cc454b662507221aa2f3930103cc21a698dbefd6
3
+ size 4882323744
model-00013-of-00016.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1042296855565962d6226c2f721b6d428d07ee692f113c3294a3a33e7496604c
3
+ size 4857158272
model-00014-of-00016.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:82d760012ed000d7bb2e52261783200d92b84404c6a1c9e401aa201e16e4eb13
3
+ size 4970416768
model-00015-of-00016.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:54d2c19a5904bcdd9ecfd5d12caf3cdc66d827d247d27d3ef91d61efcd24936a
3
+ size 4970416744
model-00016-of-00016.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:11efaac6943a776bd2bde73794eb188a17387105cb9fc9f6e89a72f4a3ebd0c2
3
+ size 4479670264
model.safetensors.index.json ADDED
@@ -0,0 +1 @@
 
 
1
+ {"metadata": {"mergekit_version": "0.0.5.1", "total_size": 78041665536}, "weight_map": {"lm_head.weight": "model-00001-of-00016.safetensors", "model.embed_tokens.weight": "model-00001-of-00016.safetensors", "model.layers.0.input_layernorm.weight": "model-00001-of-00016.safetensors", "model.layers.0.mlp.down_proj.weight": "model-00001-of-00016.safetensors", "model.layers.0.mlp.gate_proj.weight": "model-00001-of-00016.safetensors", "model.layers.0.mlp.up_proj.weight": "model-00001-of-00016.safetensors", "model.layers.0.post_attention_layernorm.weight": "model-00001-of-00016.safetensors", "model.layers.0.self_attn.k_proj.weight": "model-00001-of-00016.safetensors", "model.layers.0.self_attn.o_proj.weight": "model-00001-of-00016.safetensors", "model.layers.0.self_attn.q_proj.weight": "model-00001-of-00016.safetensors", "model.layers.0.self_attn.v_proj.weight": "model-00001-of-00016.safetensors", "model.layers.1.input_layernorm.weight": "model-00001-of-00016.safetensors", "model.layers.1.mlp.down_proj.weight": "model-00001-of-00016.safetensors", "model.layers.1.mlp.gate_proj.weight": "model-00001-of-00016.safetensors", "model.layers.1.mlp.up_proj.weight": "model-00001-of-00016.safetensors", "model.layers.1.post_attention_layernorm.weight": "model-00001-of-00016.safetensors", "model.layers.1.self_attn.k_proj.weight": "model-00001-of-00016.safetensors", "model.layers.1.self_attn.o_proj.weight": "model-00001-of-00016.safetensors", "model.layers.1.self_attn.q_proj.weight": "model-00001-of-00016.safetensors", "model.layers.1.self_attn.v_proj.weight": "model-00001-of-00016.safetensors", "model.layers.10.input_layernorm.weight": "model-00001-of-00016.safetensors", "model.layers.10.mlp.down_proj.weight": "model-00001-of-00016.safetensors", "model.layers.10.mlp.gate_proj.weight": "model-00001-of-00016.safetensors", "model.layers.10.mlp.up_proj.weight": "model-00001-of-00016.safetensors", "model.layers.10.post_attention_layernorm.weight": "model-00001-of-00016.safetensors", "model.layers.10.self_attn.k_proj.weight": "model-00001-of-00016.safetensors", "model.layers.10.self_attn.o_proj.weight": "model-00001-of-00016.safetensors", "model.layers.10.self_attn.q_proj.weight": "model-00001-of-00016.safetensors", "model.layers.10.self_attn.v_proj.weight": "model-00001-of-00016.safetensors", "model.layers.11.input_layernorm.weight": "model-00001-of-00016.safetensors", "model.layers.11.mlp.down_proj.weight": "model-00001-of-00016.safetensors", "model.layers.11.mlp.gate_proj.weight": "model-00001-of-00016.safetensors", "model.layers.11.mlp.up_proj.weight": "model-00001-of-00016.safetensors", "model.layers.11.post_attention_layernorm.weight": "model-00001-of-00016.safetensors", "model.layers.11.self_attn.k_proj.weight": "model-00001-of-00016.safetensors", "model.layers.11.self_attn.o_proj.weight": "model-00001-of-00016.safetensors", "model.layers.11.self_attn.q_proj.weight": "model-00001-of-00016.safetensors", "model.layers.11.self_attn.v_proj.weight": "model-00001-of-00016.safetensors", "model.layers.12.input_layernorm.weight": "model-00001-of-00016.safetensors", "model.layers.12.mlp.down_proj.weight": "model-00001-of-00016.safetensors", "model.layers.12.mlp.gate_proj.weight": "model-00001-of-00016.safetensors", "model.layers.12.mlp.up_proj.weight": "model-00001-of-00016.safetensors", "model.layers.12.post_attention_layernorm.weight": "model-00001-of-00016.safetensors", "model.layers.12.self_attn.k_proj.weight": "model-00001-of-00016.safetensors", "model.layers.12.self_attn.o_proj.weight": "model-00001-of-00016.safetensors", "model.layers.12.self_attn.q_proj.weight": "model-00001-of-00016.safetensors", "model.layers.12.self_attn.v_proj.weight": "model-00001-of-00016.safetensors", "model.layers.13.input_layernorm.weight": "model-00001-of-00016.safetensors", "model.layers.13.mlp.down_proj.weight": "model-00001-of-00016.safetensors", "model.layers.13.mlp.gate_proj.weight": "model-00002-of-00016.safetensors", "model.layers.13.mlp.up_proj.weight": "model-00002-of-00016.safetensors", "model.layers.13.post_attention_layernorm.weight": "model-00002-of-00016.safetensors", "model.layers.13.self_attn.k_proj.weight": "model-00002-of-00016.safetensors", "model.layers.13.self_attn.o_proj.weight": "model-00002-of-00016.safetensors", "model.layers.13.self_attn.q_proj.weight": "model-00002-of-00016.safetensors", "model.layers.13.self_attn.v_proj.weight": "model-00002-of-00016.safetensors", "model.layers.14.input_layernorm.weight": "model-00002-of-00016.safetensors", "model.layers.14.mlp.down_proj.weight": "model-00002-of-00016.safetensors", "model.layers.14.mlp.gate_proj.weight": "model-00002-of-00016.safetensors", "model.layers.14.mlp.up_proj.weight": "model-00002-of-00016.safetensors", "model.layers.14.post_attention_layernorm.weight": "model-00002-of-00016.safetensors", "model.layers.14.self_attn.k_proj.weight": "model-00002-of-00016.safetensors", "model.layers.14.self_attn.o_proj.weight": "model-00002-of-00016.safetensors", "model.layers.14.self_attn.q_proj.weight": "model-00002-of-00016.safetensors", "model.layers.14.self_attn.v_proj.weight": "model-00002-of-00016.safetensors", "model.layers.15.input_layernorm.weight": "model-00002-of-00016.safetensors", "model.layers.15.mlp.down_proj.weight": "model-00002-of-00016.safetensors", "model.layers.15.mlp.gate_proj.weight": "model-00002-of-00016.safetensors", "model.layers.15.mlp.up_proj.weight": "model-00002-of-00016.safetensors", "model.layers.15.post_attention_layernorm.weight": "model-00002-of-00016.safetensors", "model.layers.15.self_attn.k_proj.weight": "model-00002-of-00016.safetensors", "model.layers.15.self_attn.o_proj.weight": "model-00002-of-00016.safetensors", "model.layers.15.self_attn.q_proj.weight": "model-00002-of-00016.safetensors", "model.layers.15.self_attn.v_proj.weight": "model-00002-of-00016.safetensors", "model.layers.16.input_layernorm.weight": "model-00002-of-00016.safetensors", "model.layers.16.mlp.down_proj.weight": "model-00002-of-00016.safetensors", "model.layers.16.mlp.gate_proj.weight": "model-00002-of-00016.safetensors", "model.layers.16.mlp.up_proj.weight": "model-00002-of-00016.safetensors", "model.layers.16.post_attention_layernorm.weight": "model-00002-of-00016.safetensors", "model.layers.16.self_attn.k_proj.weight": "model-00002-of-00016.safetensors", "model.layers.16.self_attn.o_proj.weight": "model-00002-of-00016.safetensors", "model.layers.16.self_attn.q_proj.weight": "model-00002-of-00016.safetensors", "model.layers.16.self_attn.v_proj.weight": "model-00002-of-00016.safetensors", "model.layers.17.input_layernorm.weight": "model-00002-of-00016.safetensors", "model.layers.17.mlp.down_proj.weight": "model-00002-of-00016.safetensors", "model.layers.17.mlp.gate_proj.weight": "model-00002-of-00016.safetensors", "model.layers.17.mlp.up_proj.weight": "model-00002-of-00016.safetensors", "model.layers.17.post_attention_layernorm.weight": "model-00002-of-00016.safetensors", "model.layers.17.self_attn.k_proj.weight": "model-00002-of-00016.safetensors", "model.layers.17.self_attn.o_proj.weight": "model-00002-of-00016.safetensors", "model.layers.17.self_attn.q_proj.weight": "model-00002-of-00016.safetensors", "model.layers.17.self_attn.v_proj.weight": "model-00002-of-00016.safetensors", "model.layers.18.input_layernorm.weight": "model-00002-of-00016.safetensors", "model.layers.18.mlp.down_proj.weight": "model-00002-of-00016.safetensors", "model.layers.18.mlp.gate_proj.weight": "model-00002-of-00016.safetensors", "model.layers.18.mlp.up_proj.weight": "model-00002-of-00016.safetensors", "model.layers.18.post_attention_layernorm.weight": "model-00002-of-00016.safetensors", "model.layers.18.self_attn.k_proj.weight": "model-00002-of-00016.safetensors", "model.layers.18.self_attn.o_proj.weight": "model-00002-of-00016.safetensors", "model.layers.18.self_attn.q_proj.weight": "model-00002-of-00016.safetensors", "model.layers.18.self_attn.v_proj.weight": "model-00002-of-00016.safetensors", "model.layers.21.input_layernorm.weight": "model-00002-of-00016.safetensors", "model.layers.20.input_layernorm.weight": "model-00002-of-00016.safetensors", "model.layers.19.input_layernorm.weight": "model-00002-of-00016.safetensors", "model.layers.21.mlp.down_proj.weight": "model-00002-of-00016.safetensors", "model.layers.20.mlp.down_proj.weight": "model-00002-of-00016.safetensors", "model.layers.19.mlp.down_proj.weight": "model-00003-of-00016.safetensors", "model.layers.21.mlp.gate_proj.weight": "model-00003-of-00016.safetensors", "model.layers.20.mlp.gate_proj.weight": "model-00003-of-00016.safetensors", "model.layers.19.mlp.gate_proj.weight": "model-00003-of-00016.safetensors", "model.layers.21.mlp.up_proj.weight": "model-00003-of-00016.safetensors", "model.layers.20.mlp.up_proj.weight": "model-00003-of-00016.safetensors", "model.layers.19.mlp.up_proj.weight": "model-00003-of-00016.safetensors", "model.layers.21.post_attention_layernorm.weight": "model-00003-of-00016.safetensors", "model.layers.20.post_attention_layernorm.weight": "model-00003-of-00016.safetensors", "model.layers.19.post_attention_layernorm.weight": "model-00003-of-00016.safetensors", "model.layers.21.self_attn.k_proj.weight": "model-00003-of-00016.safetensors", "model.layers.20.self_attn.k_proj.weight": "model-00003-of-00016.safetensors", "model.layers.19.self_attn.k_proj.weight": "model-00003-of-00016.safetensors", "model.layers.21.self_attn.o_proj.weight": "model-00003-of-00016.safetensors", "model.layers.20.self_attn.o_proj.weight": "model-00003-of-00016.safetensors", "model.layers.19.self_attn.o_proj.weight": "model-00003-of-00016.safetensors", "model.layers.21.self_attn.q_proj.weight": "model-00003-of-00016.safetensors", "model.layers.20.self_attn.q_proj.weight": "model-00003-of-00016.safetensors", "model.layers.19.self_attn.q_proj.weight": "model-00003-of-00016.safetensors", "model.layers.21.self_attn.v_proj.weight": "model-00003-of-00016.safetensors", "model.layers.20.self_attn.v_proj.weight": "model-00003-of-00016.safetensors", "model.layers.19.self_attn.v_proj.weight": "model-00003-of-00016.safetensors", "model.layers.2.input_layernorm.weight": "model-00003-of-00016.safetensors", "model.layers.2.mlp.down_proj.weight": "model-00003-of-00016.safetensors", "model.layers.2.mlp.gate_proj.weight": "model-00003-of-00016.safetensors", "model.layers.2.mlp.up_proj.weight": "model-00003-of-00016.safetensors", "model.layers.2.post_attention_layernorm.weight": "model-00003-of-00016.safetensors", "model.layers.2.self_attn.k_proj.weight": "model-00003-of-00016.safetensors", "model.layers.2.self_attn.o_proj.weight": "model-00003-of-00016.safetensors", "model.layers.2.self_attn.q_proj.weight": "model-00003-of-00016.safetensors", "model.layers.2.self_attn.v_proj.weight": "model-00003-of-00016.safetensors", "model.layers.24.input_layernorm.weight": "model-00003-of-00016.safetensors", "model.layers.23.input_layernorm.weight": "model-00003-of-00016.safetensors", "model.layers.22.input_layernorm.weight": "model-00003-of-00016.safetensors", "model.layers.24.mlp.down_proj.weight": "model-00003-of-00016.safetensors", "model.layers.23.mlp.down_proj.weight": "model-00003-of-00016.safetensors", "model.layers.22.mlp.down_proj.weight": "model-00003-of-00016.safetensors", "model.layers.24.mlp.gate_proj.weight": "model-00003-of-00016.safetensors", "model.layers.23.mlp.gate_proj.weight": "model-00003-of-00016.safetensors", "model.layers.22.mlp.gate_proj.weight": "model-00003-of-00016.safetensors", "model.layers.24.mlp.up_proj.weight": "model-00003-of-00016.safetensors", "model.layers.23.mlp.up_proj.weight": "model-00003-of-00016.safetensors", "model.layers.22.mlp.up_proj.weight": "model-00003-of-00016.safetensors", "model.layers.24.post_attention_layernorm.weight": "model-00003-of-00016.safetensors", "model.layers.23.post_attention_layernorm.weight": "model-00003-of-00016.safetensors", "model.layers.22.post_attention_layernorm.weight": "model-00003-of-00016.safetensors", "model.layers.24.self_attn.k_proj.weight": "model-00003-of-00016.safetensors", "model.layers.23.self_attn.k_proj.weight": "model-00003-of-00016.safetensors", "model.layers.22.self_attn.k_proj.weight": "model-00003-of-00016.safetensors", "model.layers.24.self_attn.o_proj.weight": "model-00003-of-00016.safetensors", "model.layers.23.self_attn.o_proj.weight": "model-00003-of-00016.safetensors", "model.layers.22.self_attn.o_proj.weight": "model-00003-of-00016.safetensors", "model.layers.24.self_attn.q_proj.weight": "model-00003-of-00016.safetensors", "model.layers.23.self_attn.q_proj.weight": "model-00003-of-00016.safetensors", "model.layers.22.self_attn.q_proj.weight": "model-00004-of-00016.safetensors", "model.layers.24.self_attn.v_proj.weight": "model-00004-of-00016.safetensors", "model.layers.23.self_attn.v_proj.weight": "model-00004-of-00016.safetensors", "model.layers.22.self_attn.v_proj.weight": "model-00004-of-00016.safetensors", "model.layers.27.input_layernorm.weight": "model-00004-of-00016.safetensors", "model.layers.26.input_layernorm.weight": "model-00004-of-00016.safetensors", "model.layers.25.input_layernorm.weight": "model-00004-of-00016.safetensors", "model.layers.27.mlp.down_proj.weight": "model-00004-of-00016.safetensors", "model.layers.26.mlp.down_proj.weight": "model-00004-of-00016.safetensors", "model.layers.25.mlp.down_proj.weight": "model-00004-of-00016.safetensors", "model.layers.27.mlp.gate_proj.weight": "model-00004-of-00016.safetensors", "model.layers.26.mlp.gate_proj.weight": "model-00004-of-00016.safetensors", "model.layers.25.mlp.gate_proj.weight": "model-00004-of-00016.safetensors", "model.layers.27.mlp.up_proj.weight": "model-00004-of-00016.safetensors", "model.layers.26.mlp.up_proj.weight": "model-00004-of-00016.safetensors", "model.layers.25.mlp.up_proj.weight": "model-00004-of-00016.safetensors", "model.layers.27.post_attention_layernorm.weight": "model-00004-of-00016.safetensors", "model.layers.26.post_attention_layernorm.weight": "model-00004-of-00016.safetensors", "model.layers.25.post_attention_layernorm.weight": "model-00004-of-00016.safetensors", "model.layers.27.self_attn.k_proj.weight": "model-00004-of-00016.safetensors", "model.layers.26.self_attn.k_proj.weight": "model-00004-of-00016.safetensors", "model.layers.25.self_attn.k_proj.weight": "model-00004-of-00016.safetensors", "model.layers.27.self_attn.o_proj.weight": "model-00004-of-00016.safetensors", "model.layers.26.self_attn.o_proj.weight": "model-00004-of-00016.safetensors", "model.layers.25.self_attn.o_proj.weight": "model-00004-of-00016.safetensors", "model.layers.27.self_attn.q_proj.weight": "model-00004-of-00016.safetensors", "model.layers.26.self_attn.q_proj.weight": "model-00004-of-00016.safetensors", "model.layers.25.self_attn.q_proj.weight": "model-00004-of-00016.safetensors", "model.layers.27.self_attn.v_proj.weight": "model-00004-of-00016.safetensors", "model.layers.26.self_attn.v_proj.weight": "model-00004-of-00016.safetensors", "model.layers.25.self_attn.v_proj.weight": "model-00004-of-00016.safetensors", "model.layers.30.input_layernorm.weight": "model-00004-of-00016.safetensors", "model.layers.29.input_layernorm.weight": "model-00004-of-00016.safetensors", "model.layers.28.input_layernorm.weight": "model-00004-of-00016.safetensors", "model.layers.30.mlp.down_proj.weight": "model-00004-of-00016.safetensors", "model.layers.29.mlp.down_proj.weight": "model-00004-of-00016.safetensors", "model.layers.28.mlp.down_proj.weight": "model-00004-of-00016.safetensors", "model.layers.30.mlp.gate_proj.weight": "model-00004-of-00016.safetensors", "model.layers.29.mlp.gate_proj.weight": "model-00004-of-00016.safetensors", "model.layers.28.mlp.gate_proj.weight": "model-00004-of-00016.safetensors", "model.layers.30.mlp.up_proj.weight": "model-00004-of-00016.safetensors", "model.layers.29.mlp.up_proj.weight": "model-00004-of-00016.safetensors", "model.layers.28.mlp.up_proj.weight": "model-00004-of-00016.safetensors", "model.layers.30.post_attention_layernorm.weight": "model-00004-of-00016.safetensors", "model.layers.29.post_attention_layernorm.weight": "model-00004-of-00016.safetensors", "model.layers.28.post_attention_layernorm.weight": "model-00004-of-00016.safetensors", "model.layers.30.self_attn.k_proj.weight": "model-00004-of-00016.safetensors", "model.layers.29.self_attn.k_proj.weight": "model-00004-of-00016.safetensors", "model.layers.28.self_attn.k_proj.weight": "model-00004-of-00016.safetensors", "model.layers.30.self_attn.o_proj.weight": "model-00004-of-00016.safetensors", "model.layers.29.self_attn.o_proj.weight": "model-00004-of-00016.safetensors", "model.layers.28.self_attn.o_proj.weight": "model-00004-of-00016.safetensors", "model.layers.30.self_attn.q_proj.weight": "model-00004-of-00016.safetensors", "model.layers.29.self_attn.q_proj.weight": "model-00004-of-00016.safetensors", "model.layers.28.self_attn.q_proj.weight": "model-00004-of-00016.safetensors", "model.layers.30.self_attn.v_proj.weight": "model-00004-of-00016.safetensors", "model.layers.29.self_attn.v_proj.weight": "model-00004-of-00016.safetensors", "model.layers.28.self_attn.v_proj.weight": "model-00004-of-00016.safetensors", "model.layers.33.input_layernorm.weight": "model-00004-of-00016.safetensors", "model.layers.32.input_layernorm.weight": "model-00004-of-00016.safetensors", "model.layers.31.input_layernorm.weight": "model-00004-of-00016.safetensors", "model.layers.33.mlp.down_proj.weight": "model-00004-of-00016.safetensors", "model.layers.32.mlp.down_proj.weight": "model-00005-of-00016.safetensors", "model.layers.31.mlp.down_proj.weight": "model-00005-of-00016.safetensors", "model.layers.33.mlp.gate_proj.weight": "model-00005-of-00016.safetensors", "model.layers.32.mlp.gate_proj.weight": "model-00005-of-00016.safetensors", "model.layers.31.mlp.gate_proj.weight": "model-00005-of-00016.safetensors", "model.layers.33.mlp.up_proj.weight": "model-00005-of-00016.safetensors", "model.layers.32.mlp.up_proj.weight": "model-00005-of-00016.safetensors", "model.layers.31.mlp.up_proj.weight": "model-00005-of-00016.safetensors", "model.layers.33.post_attention_layernorm.weight": "model-00005-of-00016.safetensors", "model.layers.32.post_attention_layernorm.weight": "model-00005-of-00016.safetensors", "model.layers.31.post_attention_layernorm.weight": "model-00005-of-00016.safetensors", "model.layers.33.self_attn.k_proj.weight": "model-00005-of-00016.safetensors", "model.layers.32.self_attn.k_proj.weight": "model-00005-of-00016.safetensors", "model.layers.31.self_attn.k_proj.weight": "model-00005-of-00016.safetensors", "model.layers.33.self_attn.o_proj.weight": "model-00005-of-00016.safetensors", "model.layers.32.self_attn.o_proj.weight": "model-00005-of-00016.safetensors", "model.layers.31.self_attn.o_proj.weight": "model-00005-of-00016.safetensors", "model.layers.33.self_attn.q_proj.weight": "model-00005-of-00016.safetensors", "model.layers.32.self_attn.q_proj.weight": "model-00005-of-00016.safetensors", "model.layers.31.self_attn.q_proj.weight": "model-00005-of-00016.safetensors", "model.layers.33.self_attn.v_proj.weight": "model-00005-of-00016.safetensors", "model.layers.32.self_attn.v_proj.weight": "model-00005-of-00016.safetensors", "model.layers.31.self_attn.v_proj.weight": "model-00005-of-00016.safetensors", "model.layers.36.input_layernorm.weight": "model-00005-of-00016.safetensors", "model.layers.35.input_layernorm.weight": "model-00005-of-00016.safetensors", "model.layers.34.input_layernorm.weight": "model-00005-of-00016.safetensors", "model.layers.36.mlp.down_proj.weight": "model-00005-of-00016.safetensors", "model.layers.35.mlp.down_proj.weight": "model-00005-of-00016.safetensors", "model.layers.34.mlp.down_proj.weight": "model-00005-of-00016.safetensors", "model.layers.36.mlp.gate_proj.weight": "model-00005-of-00016.safetensors", "model.layers.35.mlp.gate_proj.weight": "model-00005-of-00016.safetensors", "model.layers.34.mlp.gate_proj.weight": "model-00005-of-00016.safetensors", "model.layers.36.mlp.up_proj.weight": "model-00005-of-00016.safetensors", "model.layers.35.mlp.up_proj.weight": "model-00005-of-00016.safetensors", "model.layers.34.mlp.up_proj.weight": "model-00005-of-00016.safetensors", "model.layers.36.post_attention_layernorm.weight": "model-00005-of-00016.safetensors", "model.layers.35.post_attention_layernorm.weight": "model-00005-of-00016.safetensors", "model.layers.34.post_attention_layernorm.weight": "model-00005-of-00016.safetensors", "model.layers.36.self_attn.k_proj.weight": "model-00005-of-00016.safetensors", "model.layers.35.self_attn.k_proj.weight": "model-00005-of-00016.safetensors", "model.layers.34.self_attn.k_proj.weight": "model-00005-of-00016.safetensors", "model.layers.36.self_attn.o_proj.weight": "model-00005-of-00016.safetensors", "model.layers.35.self_attn.o_proj.weight": "model-00005-of-00016.safetensors", "model.layers.34.self_attn.o_proj.weight": "model-00005-of-00016.safetensors", "model.layers.36.self_attn.q_proj.weight": "model-00005-of-00016.safetensors", "model.layers.35.self_attn.q_proj.weight": "model-00005-of-00016.safetensors", "model.layers.34.self_attn.q_proj.weight": "model-00005-of-00016.safetensors", "model.layers.36.self_attn.v_proj.weight": "model-00005-of-00016.safetensors", "model.layers.35.self_attn.v_proj.weight": "model-00005-of-00016.safetensors", "model.layers.34.self_attn.v_proj.weight": "model-00005-of-00016.safetensors", "model.layers.39.input_layernorm.weight": "model-00005-of-00016.safetensors", "model.layers.38.input_layernorm.weight": "model-00005-of-00016.safetensors", "model.layers.37.input_layernorm.weight": "model-00005-of-00016.safetensors", "model.layers.39.mlp.down_proj.weight": "model-00005-of-00016.safetensors", "model.layers.38.mlp.down_proj.weight": "model-00005-of-00016.safetensors", "model.layers.37.mlp.down_proj.weight": "model-00006-of-00016.safetensors", "model.layers.39.mlp.gate_proj.weight": "model-00006-of-00016.safetensors", "model.layers.38.mlp.gate_proj.weight": "model-00006-of-00016.safetensors", "model.layers.37.mlp.gate_proj.weight": "model-00006-of-00016.safetensors", "model.layers.39.mlp.up_proj.weight": "model-00006-of-00016.safetensors", "model.layers.38.mlp.up_proj.weight": "model-00006-of-00016.safetensors", "model.layers.37.mlp.up_proj.weight": "model-00006-of-00016.safetensors", "model.layers.39.post_attention_layernorm.weight": "model-00006-of-00016.safetensors", "model.layers.38.post_attention_layernorm.weight": "model-00006-of-00016.safetensors", "model.layers.37.post_attention_layernorm.weight": "model-00006-of-00016.safetensors", "model.layers.39.self_attn.k_proj.weight": "model-00006-of-00016.safetensors", "model.layers.38.self_attn.k_proj.weight": "model-00006-of-00016.safetensors", "model.layers.37.self_attn.k_proj.weight": "model-00006-of-00016.safetensors", "model.layers.39.self_attn.o_proj.weight": "model-00006-of-00016.safetensors", "model.layers.38.self_attn.o_proj.weight": "model-00006-of-00016.safetensors", "model.layers.37.self_attn.o_proj.weight": "model-00006-of-00016.safetensors", "model.layers.39.self_attn.q_proj.weight": "model-00006-of-00016.safetensors", "model.layers.38.self_attn.q_proj.weight": "model-00006-of-00016.safetensors", "model.layers.37.self_attn.q_proj.weight": "model-00006-of-00016.safetensors", "model.layers.39.self_attn.v_proj.weight": "model-00006-of-00016.safetensors", "model.layers.38.self_attn.v_proj.weight": "model-00006-of-00016.safetensors", "model.layers.37.self_attn.v_proj.weight": "model-00006-of-00016.safetensors", "model.layers.42.input_layernorm.weight": "model-00006-of-00016.safetensors", "model.layers.41.input_layernorm.weight": "model-00006-of-00016.safetensors", "model.layers.40.input_layernorm.weight": "model-00006-of-00016.safetensors", "model.layers.42.mlp.down_proj.weight": "model-00006-of-00016.safetensors", "model.layers.41.mlp.down_proj.weight": "model-00006-of-00016.safetensors", "model.layers.40.mlp.down_proj.weight": "model-00006-of-00016.safetensors", "model.layers.42.mlp.gate_proj.weight": "model-00006-of-00016.safetensors", "model.layers.41.mlp.gate_proj.weight": "model-00006-of-00016.safetensors", "model.layers.40.mlp.gate_proj.weight": "model-00006-of-00016.safetensors", "model.layers.42.mlp.up_proj.weight": "model-00006-of-00016.safetensors", "model.layers.41.mlp.up_proj.weight": "model-00006-of-00016.safetensors", "model.layers.40.mlp.up_proj.weight": "model-00006-of-00016.safetensors", "model.layers.42.post_attention_layernorm.weight": "model-00006-of-00016.safetensors", "model.layers.41.post_attention_layernorm.weight": "model-00006-of-00016.safetensors", "model.layers.40.post_attention_layernorm.weight": "model-00006-of-00016.safetensors", "model.layers.42.self_attn.k_proj.weight": "model-00006-of-00016.safetensors", "model.layers.41.self_attn.k_proj.weight": "model-00006-of-00016.safetensors", "model.layers.40.self_attn.k_proj.weight": "model-00006-of-00016.safetensors", "model.layers.42.self_attn.o_proj.weight": "model-00006-of-00016.safetensors", "model.layers.41.self_attn.o_proj.weight": "model-00006-of-00016.safetensors", "model.layers.40.self_attn.o_proj.weight": "model-00006-of-00016.safetensors", "model.layers.42.self_attn.q_proj.weight": "model-00006-of-00016.safetensors", "model.layers.41.self_attn.q_proj.weight": "model-00006-of-00016.safetensors", "model.layers.40.self_attn.q_proj.weight": "model-00006-of-00016.safetensors", "model.layers.42.self_attn.v_proj.weight": "model-00006-of-00016.safetensors", "model.layers.41.self_attn.v_proj.weight": "model-00006-of-00016.safetensors", "model.layers.40.self_attn.v_proj.weight": "model-00006-of-00016.safetensors", "model.layers.45.input_layernorm.weight": "model-00006-of-00016.safetensors", "model.layers.44.input_layernorm.weight": "model-00006-of-00016.safetensors", "model.layers.43.input_layernorm.weight": "model-00006-of-00016.safetensors", "model.layers.45.mlp.down_proj.weight": "model-00006-of-00016.safetensors", "model.layers.44.mlp.down_proj.weight": "model-00006-of-00016.safetensors", "model.layers.43.mlp.down_proj.weight": "model-00006-of-00016.safetensors", "model.layers.45.mlp.gate_proj.weight": "model-00007-of-00016.safetensors", "model.layers.44.mlp.gate_proj.weight": "model-00007-of-00016.safetensors", "model.layers.43.mlp.gate_proj.weight": "model-00007-of-00016.safetensors", "model.layers.45.mlp.up_proj.weight": "model-00007-of-00016.safetensors", "model.layers.44.mlp.up_proj.weight": "model-00007-of-00016.safetensors", "model.layers.43.mlp.up_proj.weight": "model-00007-of-00016.safetensors", "model.layers.45.post_attention_layernorm.weight": "model-00007-of-00016.safetensors", "model.layers.44.post_attention_layernorm.weight": "model-00007-of-00016.safetensors", "model.layers.43.post_attention_layernorm.weight": "model-00007-of-00016.safetensors", "model.layers.45.self_attn.k_proj.weight": "model-00007-of-00016.safetensors", "model.layers.44.self_attn.k_proj.weight": "model-00007-of-00016.safetensors", "model.layers.43.self_attn.k_proj.weight": "model-00007-of-00016.safetensors", "model.layers.45.self_attn.o_proj.weight": "model-00007-of-00016.safetensors", "model.layers.44.self_attn.o_proj.weight": "model-00007-of-00016.safetensors", "model.layers.43.self_attn.o_proj.weight": "model-00007-of-00016.safetensors", "model.layers.45.self_attn.q_proj.weight": "model-00007-of-00016.safetensors", "model.layers.44.self_attn.q_proj.weight": "model-00007-of-00016.safetensors", "model.layers.43.self_attn.q_proj.weight": "model-00007-of-00016.safetensors", "model.layers.45.self_attn.v_proj.weight": "model-00007-of-00016.safetensors", "model.layers.44.self_attn.v_proj.weight": "model-00007-of-00016.safetensors", "model.layers.43.self_attn.v_proj.weight": "model-00007-of-00016.safetensors", "model.layers.48.input_layernorm.weight": "model-00007-of-00016.safetensors", "model.layers.47.input_layernorm.weight": "model-00007-of-00016.safetensors", "model.layers.46.input_layernorm.weight": "model-00007-of-00016.safetensors", "model.layers.48.mlp.down_proj.weight": "model-00007-of-00016.safetensors", "model.layers.47.mlp.down_proj.weight": "model-00007-of-00016.safetensors", "model.layers.46.mlp.down_proj.weight": "model-00007-of-00016.safetensors", "model.layers.48.mlp.gate_proj.weight": "model-00007-of-00016.safetensors", "model.layers.47.mlp.gate_proj.weight": "model-00007-of-00016.safetensors", "model.layers.46.mlp.gate_proj.weight": "model-00007-of-00016.safetensors", "model.layers.48.mlp.up_proj.weight": "model-00007-of-00016.safetensors", "model.layers.47.mlp.up_proj.weight": "model-00007-of-00016.safetensors", "model.layers.46.mlp.up_proj.weight": "model-00007-of-00016.safetensors", "model.layers.48.post_attention_layernorm.weight": "model-00007-of-00016.safetensors", "model.layers.47.post_attention_layernorm.weight": "model-00007-of-00016.safetensors", "model.layers.46.post_attention_layernorm.weight": "model-00007-of-00016.safetensors", "model.layers.48.self_attn.k_proj.weight": "model-00007-of-00016.safetensors", "model.layers.47.self_attn.k_proj.weight": "model-00007-of-00016.safetensors", "model.layers.46.self_attn.k_proj.weight": "model-00007-of-00016.safetensors", "model.layers.48.self_attn.o_proj.weight": "model-00007-of-00016.safetensors", "model.layers.47.self_attn.o_proj.weight": "model-00007-of-00016.safetensors", "model.layers.46.self_attn.o_proj.weight": "model-00007-of-00016.safetensors", "model.layers.48.self_attn.q_proj.weight": "model-00007-of-00016.safetensors", "model.layers.47.self_attn.q_proj.weight": "model-00007-of-00016.safetensors", "model.layers.46.self_attn.q_proj.weight": "model-00007-of-00016.safetensors", "model.layers.48.self_attn.v_proj.weight": "model-00007-of-00016.safetensors", "model.layers.47.self_attn.v_proj.weight": "model-00007-of-00016.safetensors", "model.layers.46.self_attn.v_proj.weight": "model-00007-of-00016.safetensors", "model.layers.51.input_layernorm.weight": "model-00007-of-00016.safetensors", "model.layers.50.input_layernorm.weight": "model-00007-of-00016.safetensors", "model.layers.49.input_layernorm.weight": "model-00007-of-00016.safetensors", "model.layers.51.mlp.down_proj.weight": "model-00007-of-00016.safetensors", "model.layers.50.mlp.down_proj.weight": "model-00007-of-00016.safetensors", "model.layers.49.mlp.down_proj.weight": "model-00007-of-00016.safetensors", "model.layers.51.mlp.gate_proj.weight": "model-00007-of-00016.safetensors", "model.layers.50.mlp.gate_proj.weight": "model-00008-of-00016.safetensors", "model.layers.49.mlp.gate_proj.weight": "model-00008-of-00016.safetensors", "model.layers.51.mlp.up_proj.weight": "model-00008-of-00016.safetensors", "model.layers.50.mlp.up_proj.weight": "model-00008-of-00016.safetensors", "model.layers.49.mlp.up_proj.weight": "model-00008-of-00016.safetensors", "model.layers.51.post_attention_layernorm.weight": "model-00008-of-00016.safetensors", "model.layers.50.post_attention_layernorm.weight": "model-00008-of-00016.safetensors", "model.layers.49.post_attention_layernorm.weight": "model-00008-of-00016.safetensors", "model.layers.51.self_attn.k_proj.weight": "model-00008-of-00016.safetensors", "model.layers.50.self_attn.k_proj.weight": "model-00008-of-00016.safetensors", "model.layers.49.self_attn.k_proj.weight": "model-00008-of-00016.safetensors", "model.layers.51.self_attn.o_proj.weight": "model-00008-of-00016.safetensors", "model.layers.50.self_attn.o_proj.weight": "model-00008-of-00016.safetensors", "model.layers.49.self_attn.o_proj.weight": "model-00008-of-00016.safetensors", "model.layers.51.self_attn.q_proj.weight": "model-00008-of-00016.safetensors", "model.layers.50.self_attn.q_proj.weight": "model-00008-of-00016.safetensors", "model.layers.49.self_attn.q_proj.weight": "model-00008-of-00016.safetensors", "model.layers.51.self_attn.v_proj.weight": "model-00008-of-00016.safetensors", "model.layers.50.self_attn.v_proj.weight": "model-00008-of-00016.safetensors", "model.layers.49.self_attn.v_proj.weight": "model-00008-of-00016.safetensors", "model.layers.3.input_layernorm.weight": "model-00008-of-00016.safetensors", "model.layers.3.mlp.down_proj.weight": "model-00008-of-00016.safetensors", "model.layers.3.mlp.gate_proj.weight": "model-00008-of-00016.safetensors", "model.layers.3.mlp.up_proj.weight": "model-00008-of-00016.safetensors", "model.layers.3.post_attention_layernorm.weight": "model-00008-of-00016.safetensors", "model.layers.3.self_attn.k_proj.weight": "model-00008-of-00016.safetensors", "model.layers.3.self_attn.o_proj.weight": "model-00008-of-00016.safetensors", "model.layers.3.self_attn.q_proj.weight": "model-00008-of-00016.safetensors", "model.layers.3.self_attn.v_proj.weight": "model-00008-of-00016.safetensors", "model.layers.54.input_layernorm.weight": "model-00008-of-00016.safetensors", "model.layers.53.input_layernorm.weight": "model-00008-of-00016.safetensors", "model.layers.52.input_layernorm.weight": "model-00008-of-00016.safetensors", "model.layers.54.mlp.down_proj.weight": "model-00008-of-00016.safetensors", "model.layers.53.mlp.down_proj.weight": "model-00008-of-00016.safetensors", "model.layers.52.mlp.down_proj.weight": "model-00008-of-00016.safetensors", "model.layers.54.mlp.gate_proj.weight": "model-00008-of-00016.safetensors", "model.layers.53.mlp.gate_proj.weight": "model-00008-of-00016.safetensors", "model.layers.52.mlp.gate_proj.weight": "model-00008-of-00016.safetensors", "model.layers.54.mlp.up_proj.weight": "model-00008-of-00016.safetensors", "model.layers.53.mlp.up_proj.weight": "model-00008-of-00016.safetensors", "model.layers.52.mlp.up_proj.weight": "model-00008-of-00016.safetensors", "model.layers.54.post_attention_layernorm.weight": "model-00008-of-00016.safetensors", "model.layers.53.post_attention_layernorm.weight": "model-00008-of-00016.safetensors", "model.layers.52.post_attention_layernorm.weight": "model-00008-of-00016.safetensors", "model.layers.54.self_attn.k_proj.weight": "model-00008-of-00016.safetensors", "model.layers.53.self_attn.k_proj.weight": "model-00008-of-00016.safetensors", "model.layers.52.self_attn.k_proj.weight": "model-00008-of-00016.safetensors", "model.layers.54.self_attn.o_proj.weight": "model-00008-of-00016.safetensors", "model.layers.53.self_attn.o_proj.weight": "model-00008-of-00016.safetensors", "model.layers.52.self_attn.o_proj.weight": "model-00008-of-00016.safetensors", "model.layers.54.self_attn.q_proj.weight": "model-00008-of-00016.safetensors", "model.layers.53.self_attn.q_proj.weight": "model-00008-of-00016.safetensors", "model.layers.52.self_attn.q_proj.weight": "model-00008-of-00016.safetensors", "model.layers.54.self_attn.v_proj.weight": "model-00008-of-00016.safetensors", "model.layers.53.self_attn.v_proj.weight": "model-00008-of-00016.safetensors", "model.layers.52.self_attn.v_proj.weight": "model-00008-of-00016.safetensors", "model.layers.57.input_layernorm.weight": "model-00008-of-00016.safetensors", "model.layers.56.input_layernorm.weight": "model-00008-of-00016.safetensors", "model.layers.55.input_layernorm.weight": "model-00008-of-00016.safetensors", "model.layers.57.mlp.down_proj.weight": "model-00008-of-00016.safetensors", "model.layers.56.mlp.down_proj.weight": "model-00009-of-00016.safetensors", "model.layers.55.mlp.down_proj.weight": "model-00009-of-00016.safetensors", "model.layers.57.mlp.gate_proj.weight": "model-00009-of-00016.safetensors", "model.layers.56.mlp.gate_proj.weight": "model-00009-of-00016.safetensors", "model.layers.55.mlp.gate_proj.weight": "model-00009-of-00016.safetensors", "model.layers.57.mlp.up_proj.weight": "model-00009-of-00016.safetensors", "model.layers.56.mlp.up_proj.weight": "model-00009-of-00016.safetensors", "model.layers.55.mlp.up_proj.weight": "model-00009-of-00016.safetensors", "model.layers.57.post_attention_layernorm.weight": "model-00009-of-00016.safetensors", "model.layers.56.post_attention_layernorm.weight": "model-00009-of-00016.safetensors", "model.layers.55.post_attention_layernorm.weight": "model-00009-of-00016.safetensors", "model.layers.57.self_attn.k_proj.weight": "model-00009-of-00016.safetensors", "model.layers.56.self_attn.k_proj.weight": "model-00009-of-00016.safetensors", "model.layers.55.self_attn.k_proj.weight": "model-00009-of-00016.safetensors", "model.layers.57.self_attn.o_proj.weight": "model-00009-of-00016.safetensors", "model.layers.56.self_attn.o_proj.weight": "model-00009-of-00016.safetensors", "model.layers.55.self_attn.o_proj.weight": "model-00009-of-00016.safetensors", "model.layers.57.self_attn.q_proj.weight": "model-00009-of-00016.safetensors", "model.layers.56.self_attn.q_proj.weight": "model-00009-of-00016.safetensors", "model.layers.55.self_attn.q_proj.weight": "model-00009-of-00016.safetensors", "model.layers.57.self_attn.v_proj.weight": "model-00009-of-00016.safetensors", "model.layers.56.self_attn.v_proj.weight": "model-00009-of-00016.safetensors", "model.layers.55.self_attn.v_proj.weight": "model-00009-of-00016.safetensors", "model.layers.60.input_layernorm.weight": "model-00009-of-00016.safetensors", "model.layers.59.input_layernorm.weight": "model-00009-of-00016.safetensors", "model.layers.58.input_layernorm.weight": "model-00009-of-00016.safetensors", "model.layers.60.mlp.down_proj.weight": "model-00009-of-00016.safetensors", "model.layers.59.mlp.down_proj.weight": "model-00009-of-00016.safetensors", "model.layers.58.mlp.down_proj.weight": "model-00009-of-00016.safetensors", "model.layers.60.mlp.gate_proj.weight": "model-00009-of-00016.safetensors", "model.layers.59.mlp.gate_proj.weight": "model-00009-of-00016.safetensors", "model.layers.58.mlp.gate_proj.weight": "model-00009-of-00016.safetensors", "model.layers.60.mlp.up_proj.weight": "model-00009-of-00016.safetensors", "model.layers.59.mlp.up_proj.weight": "model-00009-of-00016.safetensors", "model.layers.58.mlp.up_proj.weight": "model-00009-of-00016.safetensors", "model.layers.60.post_attention_layernorm.weight": "model-00009-of-00016.safetensors", "model.layers.59.post_attention_layernorm.weight": "model-00009-of-00016.safetensors", "model.layers.58.post_attention_layernorm.weight": "model-00009-of-00016.safetensors", "model.layers.60.self_attn.k_proj.weight": "model-00009-of-00016.safetensors", "model.layers.59.self_attn.k_proj.weight": "model-00009-of-00016.safetensors", "model.layers.58.self_attn.k_proj.weight": "model-00009-of-00016.safetensors", "model.layers.60.self_attn.o_proj.weight": "model-00009-of-00016.safetensors", "model.layers.59.self_attn.o_proj.weight": "model-00009-of-00016.safetensors", "model.layers.58.self_attn.o_proj.weight": "model-00009-of-00016.safetensors", "model.layers.60.self_attn.q_proj.weight": "model-00009-of-00016.safetensors", "model.layers.59.self_attn.q_proj.weight": "model-00009-of-00016.safetensors", "model.layers.58.self_attn.q_proj.weight": "model-00009-of-00016.safetensors", "model.layers.60.self_attn.v_proj.weight": "model-00009-of-00016.safetensors", "model.layers.59.self_attn.v_proj.weight": "model-00009-of-00016.safetensors", "model.layers.58.self_attn.v_proj.weight": "model-00009-of-00016.safetensors", "model.layers.63.input_layernorm.weight": "model-00009-of-00016.safetensors", "model.layers.62.input_layernorm.weight": "model-00009-of-00016.safetensors", "model.layers.61.input_layernorm.weight": "model-00009-of-00016.safetensors", "model.layers.63.mlp.down_proj.weight": "model-00009-of-00016.safetensors", "model.layers.62.mlp.down_proj.weight": "model-00009-of-00016.safetensors", "model.layers.61.mlp.down_proj.weight": "model-00010-of-00016.safetensors", "model.layers.63.mlp.gate_proj.weight": "model-00010-of-00016.safetensors", "model.layers.62.mlp.gate_proj.weight": "model-00010-of-00016.safetensors", "model.layers.61.mlp.gate_proj.weight": "model-00010-of-00016.safetensors", "model.layers.63.mlp.up_proj.weight": "model-00010-of-00016.safetensors", "model.layers.62.mlp.up_proj.weight": "model-00010-of-00016.safetensors", "model.layers.61.mlp.up_proj.weight": "model-00010-of-00016.safetensors", "model.layers.63.post_attention_layernorm.weight": "model-00010-of-00016.safetensors", "model.layers.62.post_attention_layernorm.weight": "model-00010-of-00016.safetensors", "model.layers.61.post_attention_layernorm.weight": "model-00010-of-00016.safetensors", "model.layers.63.self_attn.k_proj.weight": "model-00010-of-00016.safetensors", "model.layers.62.self_attn.k_proj.weight": "model-00010-of-00016.safetensors", "model.layers.61.self_attn.k_proj.weight": "model-00010-of-00016.safetensors", "model.layers.63.self_attn.o_proj.weight": "model-00010-of-00016.safetensors", "model.layers.62.self_attn.o_proj.weight": "model-00010-of-00016.safetensors", "model.layers.61.self_attn.o_proj.weight": "model-00010-of-00016.safetensors", "model.layers.63.self_attn.q_proj.weight": "model-00010-of-00016.safetensors", "model.layers.62.self_attn.q_proj.weight": "model-00010-of-00016.safetensors", "model.layers.61.self_attn.q_proj.weight": "model-00010-of-00016.safetensors", "model.layers.63.self_attn.v_proj.weight": "model-00010-of-00016.safetensors", "model.layers.62.self_attn.v_proj.weight": "model-00010-of-00016.safetensors", "model.layers.61.self_attn.v_proj.weight": "model-00010-of-00016.safetensors", "model.layers.66.input_layernorm.weight": "model-00010-of-00016.safetensors", "model.layers.65.input_layernorm.weight": "model-00010-of-00016.safetensors", "model.layers.64.input_layernorm.weight": "model-00010-of-00016.safetensors", "model.layers.66.mlp.down_proj.weight": "model-00010-of-00016.safetensors", "model.layers.65.mlp.down_proj.weight": "model-00010-of-00016.safetensors", "model.layers.64.mlp.down_proj.weight": "model-00010-of-00016.safetensors", "model.layers.66.mlp.gate_proj.weight": "model-00010-of-00016.safetensors", "model.layers.65.mlp.gate_proj.weight": "model-00010-of-00016.safetensors", "model.layers.64.mlp.gate_proj.weight": "model-00010-of-00016.safetensors", "model.layers.66.mlp.up_proj.weight": "model-00010-of-00016.safetensors", "model.layers.65.mlp.up_proj.weight": "model-00010-of-00016.safetensors", "model.layers.64.mlp.up_proj.weight": "model-00010-of-00016.safetensors", "model.layers.66.post_attention_layernorm.weight": "model-00010-of-00016.safetensors", "model.layers.65.post_attention_layernorm.weight": "model-00010-of-00016.safetensors", "model.layers.64.post_attention_layernorm.weight": "model-00010-of-00016.safetensors", "model.layers.66.self_attn.k_proj.weight": "model-00010-of-00016.safetensors", "model.layers.65.self_attn.k_proj.weight": "model-00010-of-00016.safetensors", "model.layers.64.self_attn.k_proj.weight": "model-00010-of-00016.safetensors", "model.layers.66.self_attn.o_proj.weight": "model-00010-of-00016.safetensors", "model.layers.65.self_attn.o_proj.weight": "model-00010-of-00016.safetensors", "model.layers.64.self_attn.o_proj.weight": "model-00010-of-00016.safetensors", "model.layers.66.self_attn.q_proj.weight": "model-00010-of-00016.safetensors", "model.layers.65.self_attn.q_proj.weight": "model-00010-of-00016.safetensors", "model.layers.64.self_attn.q_proj.weight": "model-00010-of-00016.safetensors", "model.layers.66.self_attn.v_proj.weight": "model-00010-of-00016.safetensors", "model.layers.65.self_attn.v_proj.weight": "model-00010-of-00016.safetensors", "model.layers.64.self_attn.v_proj.weight": "model-00010-of-00016.safetensors", "model.layers.69.input_layernorm.weight": "model-00010-of-00016.safetensors", "model.layers.68.input_layernorm.weight": "model-00010-of-00016.safetensors", "model.layers.67.input_layernorm.weight": "model-00010-of-00016.safetensors", "model.layers.69.mlp.down_proj.weight": "model-00010-of-00016.safetensors", "model.layers.68.mlp.down_proj.weight": "model-00010-of-00016.safetensors", "model.layers.67.mlp.down_proj.weight": "model-00010-of-00016.safetensors", "model.layers.69.mlp.gate_proj.weight": "model-00011-of-00016.safetensors", "model.layers.68.mlp.gate_proj.weight": "model-00011-of-00016.safetensors", "model.layers.67.mlp.gate_proj.weight": "model-00011-of-00016.safetensors", "model.layers.69.mlp.up_proj.weight": "model-00011-of-00016.safetensors", "model.layers.68.mlp.up_proj.weight": "model-00011-of-00016.safetensors", "model.layers.67.mlp.up_proj.weight": "model-00011-of-00016.safetensors", "model.layers.69.post_attention_layernorm.weight": "model-00011-of-00016.safetensors", "model.layers.68.post_attention_layernorm.weight": "model-00011-of-00016.safetensors", "model.layers.67.post_attention_layernorm.weight": "model-00011-of-00016.safetensors", "model.layers.69.self_attn.k_proj.weight": "model-00011-of-00016.safetensors", "model.layers.68.self_attn.k_proj.weight": "model-00011-of-00016.safetensors", "model.layers.67.self_attn.k_proj.weight": "model-00011-of-00016.safetensors", "model.layers.69.self_attn.o_proj.weight": "model-00011-of-00016.safetensors", "model.layers.68.self_attn.o_proj.weight": "model-00011-of-00016.safetensors", "model.layers.67.self_attn.o_proj.weight": "model-00011-of-00016.safetensors", "model.layers.69.self_attn.q_proj.weight": "model-00011-of-00016.safetensors", "model.layers.68.self_attn.q_proj.weight": "model-00011-of-00016.safetensors", "model.layers.67.self_attn.q_proj.weight": "model-00011-of-00016.safetensors", "model.layers.69.self_attn.v_proj.weight": "model-00011-of-00016.safetensors", "model.layers.68.self_attn.v_proj.weight": "model-00011-of-00016.safetensors", "model.layers.67.self_attn.v_proj.weight": "model-00011-of-00016.safetensors", "model.layers.72.input_layernorm.weight": "model-00011-of-00016.safetensors", "model.layers.71.input_layernorm.weight": "model-00011-of-00016.safetensors", "model.layers.70.input_layernorm.weight": "model-00011-of-00016.safetensors", "model.layers.72.mlp.down_proj.weight": "model-00011-of-00016.safetensors", "model.layers.71.mlp.down_proj.weight": "model-00011-of-00016.safetensors", "model.layers.70.mlp.down_proj.weight": "model-00011-of-00016.safetensors", "model.layers.72.mlp.gate_proj.weight": "model-00011-of-00016.safetensors", "model.layers.71.mlp.gate_proj.weight": "model-00011-of-00016.safetensors", "model.layers.70.mlp.gate_proj.weight": "model-00011-of-00016.safetensors", "model.layers.72.mlp.up_proj.weight": "model-00011-of-00016.safetensors", "model.layers.71.mlp.up_proj.weight": "model-00011-of-00016.safetensors", "model.layers.70.mlp.up_proj.weight": "model-00011-of-00016.safetensors", "model.layers.72.post_attention_layernorm.weight": "model-00011-of-00016.safetensors", "model.layers.71.post_attention_layernorm.weight": "model-00011-of-00016.safetensors", "model.layers.70.post_attention_layernorm.weight": "model-00011-of-00016.safetensors", "model.layers.72.self_attn.k_proj.weight": "model-00011-of-00016.safetensors", "model.layers.71.self_attn.k_proj.weight": "model-00011-of-00016.safetensors", "model.layers.70.self_attn.k_proj.weight": "model-00011-of-00016.safetensors", "model.layers.72.self_attn.o_proj.weight": "model-00011-of-00016.safetensors", "model.layers.71.self_attn.o_proj.weight": "model-00011-of-00016.safetensors", "model.layers.70.self_attn.o_proj.weight": "model-00011-of-00016.safetensors", "model.layers.72.self_attn.q_proj.weight": "model-00011-of-00016.safetensors", "model.layers.71.self_attn.q_proj.weight": "model-00011-of-00016.safetensors", "model.layers.70.self_attn.q_proj.weight": "model-00011-of-00016.safetensors", "model.layers.72.self_attn.v_proj.weight": "model-00011-of-00016.safetensors", "model.layers.71.self_attn.v_proj.weight": "model-00011-of-00016.safetensors", "model.layers.70.self_attn.v_proj.weight": "model-00011-of-00016.safetensors", "model.layers.75.input_layernorm.weight": "model-00011-of-00016.safetensors", "model.layers.74.input_layernorm.weight": "model-00011-of-00016.safetensors", "model.layers.73.input_layernorm.weight": "model-00011-of-00016.safetensors", "model.layers.75.mlp.down_proj.weight": "model-00011-of-00016.safetensors", "model.layers.74.mlp.down_proj.weight": "model-00011-of-00016.safetensors", "model.layers.73.mlp.down_proj.weight": "model-00011-of-00016.safetensors", "model.layers.75.mlp.gate_proj.weight": "model-00011-of-00016.safetensors", "model.layers.74.mlp.gate_proj.weight": "model-00012-of-00016.safetensors", "model.layers.73.mlp.gate_proj.weight": "model-00012-of-00016.safetensors", "model.layers.75.mlp.up_proj.weight": "model-00012-of-00016.safetensors", "model.layers.74.mlp.up_proj.weight": "model-00012-of-00016.safetensors", "model.layers.73.mlp.up_proj.weight": "model-00012-of-00016.safetensors", "model.layers.75.post_attention_layernorm.weight": "model-00012-of-00016.safetensors", "model.layers.74.post_attention_layernorm.weight": "model-00012-of-00016.safetensors", "model.layers.73.post_attention_layernorm.weight": "model-00012-of-00016.safetensors", "model.layers.75.self_attn.k_proj.weight": "model-00012-of-00016.safetensors", "model.layers.74.self_attn.k_proj.weight": "model-00012-of-00016.safetensors", "model.layers.73.self_attn.k_proj.weight": "model-00012-of-00016.safetensors", "model.layers.75.self_attn.o_proj.weight": "model-00012-of-00016.safetensors", "model.layers.74.self_attn.o_proj.weight": "model-00012-of-00016.safetensors", "model.layers.73.self_attn.o_proj.weight": "model-00012-of-00016.safetensors", "model.layers.75.self_attn.q_proj.weight": "model-00012-of-00016.safetensors", "model.layers.74.self_attn.q_proj.weight": "model-00012-of-00016.safetensors", "model.layers.73.self_attn.q_proj.weight": "model-00012-of-00016.safetensors", "model.layers.75.self_attn.v_proj.weight": "model-00012-of-00016.safetensors", "model.layers.74.self_attn.v_proj.weight": "model-00012-of-00016.safetensors", "model.layers.73.self_attn.v_proj.weight": "model-00012-of-00016.safetensors", "model.layers.78.input_layernorm.weight": "model-00012-of-00016.safetensors", "model.layers.77.input_layernorm.weight": "model-00012-of-00016.safetensors", "model.layers.76.input_layernorm.weight": "model-00012-of-00016.safetensors", "model.layers.78.mlp.down_proj.weight": "model-00012-of-00016.safetensors", "model.layers.77.mlp.down_proj.weight": "model-00012-of-00016.safetensors", "model.layers.76.mlp.down_proj.weight": "model-00012-of-00016.safetensors", "model.layers.78.mlp.gate_proj.weight": "model-00012-of-00016.safetensors", "model.layers.77.mlp.gate_proj.weight": "model-00012-of-00016.safetensors", "model.layers.76.mlp.gate_proj.weight": "model-00012-of-00016.safetensors", "model.layers.78.mlp.up_proj.weight": "model-00012-of-00016.safetensors", "model.layers.77.mlp.up_proj.weight": "model-00012-of-00016.safetensors", "model.layers.76.mlp.up_proj.weight": "model-00012-of-00016.safetensors", "model.layers.78.post_attention_layernorm.weight": "model-00012-of-00016.safetensors", "model.layers.77.post_attention_layernorm.weight": "model-00012-of-00016.safetensors", "model.layers.76.post_attention_layernorm.weight": "model-00012-of-00016.safetensors", "model.layers.78.self_attn.k_proj.weight": "model-00012-of-00016.safetensors", "model.layers.77.self_attn.k_proj.weight": "model-00012-of-00016.safetensors", "model.layers.76.self_attn.k_proj.weight": "model-00012-of-00016.safetensors", "model.layers.78.self_attn.o_proj.weight": "model-00012-of-00016.safetensors", "model.layers.77.self_attn.o_proj.weight": "model-00012-of-00016.safetensors", "model.layers.76.self_attn.o_proj.weight": "model-00012-of-00016.safetensors", "model.layers.78.self_attn.q_proj.weight": "model-00012-of-00016.safetensors", "model.layers.77.self_attn.q_proj.weight": "model-00012-of-00016.safetensors", "model.layers.76.self_attn.q_proj.weight": "model-00012-of-00016.safetensors", "model.layers.78.self_attn.v_proj.weight": "model-00012-of-00016.safetensors", "model.layers.77.self_attn.v_proj.weight": "model-00012-of-00016.safetensors", "model.layers.76.self_attn.v_proj.weight": "model-00012-of-00016.safetensors", "model.layers.81.input_layernorm.weight": "model-00012-of-00016.safetensors", "model.layers.80.input_layernorm.weight": "model-00012-of-00016.safetensors", "model.layers.79.input_layernorm.weight": "model-00012-of-00016.safetensors", "model.layers.81.mlp.down_proj.weight": "model-00012-of-00016.safetensors", "model.layers.80.mlp.down_proj.weight": "model-00012-of-00016.safetensors", "model.layers.79.mlp.down_proj.weight": "model-00012-of-00016.safetensors", "model.layers.81.mlp.gate_proj.weight": "model-00012-of-00016.safetensors", "model.layers.80.mlp.gate_proj.weight": "model-00012-of-00016.safetensors", "model.layers.79.mlp.gate_proj.weight": "model-00013-of-00016.safetensors", "model.layers.81.mlp.up_proj.weight": "model-00013-of-00016.safetensors", "model.layers.80.mlp.up_proj.weight": "model-00013-of-00016.safetensors", "model.layers.79.mlp.up_proj.weight": "model-00013-of-00016.safetensors", "model.layers.81.post_attention_layernorm.weight": "model-00013-of-00016.safetensors", "model.layers.80.post_attention_layernorm.weight": "model-00013-of-00016.safetensors", "model.layers.79.post_attention_layernorm.weight": "model-00013-of-00016.safetensors", "model.layers.81.self_attn.k_proj.weight": "model-00013-of-00016.safetensors", "model.layers.80.self_attn.k_proj.weight": "model-00013-of-00016.safetensors", "model.layers.79.self_attn.k_proj.weight": "model-00013-of-00016.safetensors", "model.layers.81.self_attn.o_proj.weight": "model-00013-of-00016.safetensors", "model.layers.80.self_attn.o_proj.weight": "model-00013-of-00016.safetensors", "model.layers.79.self_attn.o_proj.weight": "model-00013-of-00016.safetensors", "model.layers.81.self_attn.q_proj.weight": "model-00013-of-00016.safetensors", "model.layers.80.self_attn.q_proj.weight": "model-00013-of-00016.safetensors", "model.layers.79.self_attn.q_proj.weight": "model-00013-of-00016.safetensors", "model.layers.81.self_attn.v_proj.weight": "model-00013-of-00016.safetensors", "model.layers.80.self_attn.v_proj.weight": "model-00013-of-00016.safetensors", "model.layers.79.self_attn.v_proj.weight": "model-00013-of-00016.safetensors", "model.layers.4.input_layernorm.weight": "model-00013-of-00016.safetensors", "model.layers.4.mlp.down_proj.weight": "model-00013-of-00016.safetensors", "model.layers.4.mlp.gate_proj.weight": "model-00013-of-00016.safetensors", "model.layers.4.mlp.up_proj.weight": "model-00013-of-00016.safetensors", "model.layers.4.post_attention_layernorm.weight": "model-00013-of-00016.safetensors", "model.layers.4.self_attn.k_proj.weight": "model-00013-of-00016.safetensors", "model.layers.4.self_attn.o_proj.weight": "model-00013-of-00016.safetensors", "model.layers.4.self_attn.q_proj.weight": "model-00013-of-00016.safetensors", "model.layers.4.self_attn.v_proj.weight": "model-00013-of-00016.safetensors", "model.layers.84.input_layernorm.weight": "model-00013-of-00016.safetensors", "model.layers.83.input_layernorm.weight": "model-00013-of-00016.safetensors", "model.layers.82.input_layernorm.weight": "model-00013-of-00016.safetensors", "model.layers.84.mlp.down_proj.weight": "model-00013-of-00016.safetensors", "model.layers.83.mlp.down_proj.weight": "model-00013-of-00016.safetensors", "model.layers.82.mlp.down_proj.weight": "model-00013-of-00016.safetensors", "model.layers.84.mlp.gate_proj.weight": "model-00013-of-00016.safetensors", "model.layers.83.mlp.gate_proj.weight": "model-00013-of-00016.safetensors", "model.layers.82.mlp.gate_proj.weight": "model-00013-of-00016.safetensors", "model.layers.84.mlp.up_proj.weight": "model-00013-of-00016.safetensors", "model.layers.83.mlp.up_proj.weight": "model-00013-of-00016.safetensors", "model.layers.82.mlp.up_proj.weight": "model-00013-of-00016.safetensors", "model.layers.84.post_attention_layernorm.weight": "model-00013-of-00016.safetensors", "model.layers.83.post_attention_layernorm.weight": "model-00013-of-00016.safetensors", "model.layers.82.post_attention_layernorm.weight": "model-00013-of-00016.safetensors", "model.layers.84.self_attn.k_proj.weight": "model-00013-of-00016.safetensors", "model.layers.83.self_attn.k_proj.weight": "model-00013-of-00016.safetensors", "model.layers.82.self_attn.k_proj.weight": "model-00013-of-00016.safetensors", "model.layers.84.self_attn.o_proj.weight": "model-00013-of-00016.safetensors", "model.layers.83.self_attn.o_proj.weight": "model-00013-of-00016.safetensors", "model.layers.82.self_attn.o_proj.weight": "model-00013-of-00016.safetensors", "model.layers.84.self_attn.q_proj.weight": "model-00013-of-00016.safetensors", "model.layers.83.self_attn.q_proj.weight": "model-00013-of-00016.safetensors", "model.layers.82.self_attn.q_proj.weight": "model-00013-of-00016.safetensors", "model.layers.84.self_attn.v_proj.weight": "model-00013-of-00016.safetensors", "model.layers.83.self_attn.v_proj.weight": "model-00013-of-00016.safetensors", "model.layers.82.self_attn.v_proj.weight": "model-00013-of-00016.safetensors", "model.layers.85.input_layernorm.weight": "model-00013-of-00016.safetensors", "model.layers.85.mlp.down_proj.weight": "model-00013-of-00016.safetensors", "model.layers.85.mlp.gate_proj.weight": "model-00013-of-00016.safetensors", "model.layers.85.mlp.up_proj.weight": "model-00014-of-00016.safetensors", "model.layers.85.post_attention_layernorm.weight": "model-00014-of-00016.safetensors", "model.layers.85.self_attn.k_proj.weight": "model-00014-of-00016.safetensors", "model.layers.85.self_attn.o_proj.weight": "model-00014-of-00016.safetensors", "model.layers.85.self_attn.q_proj.weight": "model-00014-of-00016.safetensors", "model.layers.85.self_attn.v_proj.weight": "model-00014-of-00016.safetensors", "model.layers.86.input_layernorm.weight": "model-00014-of-00016.safetensors", "model.layers.86.mlp.down_proj.weight": "model-00014-of-00016.safetensors", "model.layers.86.mlp.gate_proj.weight": "model-00014-of-00016.safetensors", "model.layers.86.mlp.up_proj.weight": "model-00014-of-00016.safetensors", "model.layers.86.post_attention_layernorm.weight": "model-00014-of-00016.safetensors", "model.layers.86.self_attn.k_proj.weight": "model-00014-of-00016.safetensors", "model.layers.86.self_attn.o_proj.weight": "model-00014-of-00016.safetensors", "model.layers.86.self_attn.q_proj.weight": "model-00014-of-00016.safetensors", "model.layers.86.self_attn.v_proj.weight": "model-00014-of-00016.safetensors", "model.layers.87.input_layernorm.weight": "model-00014-of-00016.safetensors", "model.layers.87.mlp.down_proj.weight": "model-00014-of-00016.safetensors", "model.layers.87.mlp.gate_proj.weight": "model-00014-of-00016.safetensors", "model.layers.87.mlp.up_proj.weight": "model-00014-of-00016.safetensors", "model.layers.87.post_attention_layernorm.weight": "model-00014-of-00016.safetensors", "model.layers.87.self_attn.k_proj.weight": "model-00014-of-00016.safetensors", "model.layers.87.self_attn.o_proj.weight": "model-00014-of-00016.safetensors", "model.layers.87.self_attn.q_proj.weight": "model-00014-of-00016.safetensors", "model.layers.87.self_attn.v_proj.weight": "model-00014-of-00016.safetensors", "model.layers.88.input_layernorm.weight": "model-00014-of-00016.safetensors", "model.layers.88.mlp.down_proj.weight": "model-00014-of-00016.safetensors", "model.layers.88.mlp.gate_proj.weight": "model-00014-of-00016.safetensors", "model.layers.88.mlp.up_proj.weight": "model-00014-of-00016.safetensors", "model.layers.88.post_attention_layernorm.weight": "model-00014-of-00016.safetensors", "model.layers.88.self_attn.k_proj.weight": "model-00014-of-00016.safetensors", "model.layers.88.self_attn.o_proj.weight": "model-00014-of-00016.safetensors", "model.layers.88.self_attn.q_proj.weight": "model-00014-of-00016.safetensors", "model.layers.88.self_attn.v_proj.weight": "model-00014-of-00016.safetensors", "model.layers.89.input_layernorm.weight": "model-00014-of-00016.safetensors", "model.layers.89.mlp.down_proj.weight": "model-00014-of-00016.safetensors", "model.layers.89.mlp.gate_proj.weight": "model-00014-of-00016.safetensors", "model.layers.89.mlp.up_proj.weight": "model-00014-of-00016.safetensors", "model.layers.89.post_attention_layernorm.weight": "model-00014-of-00016.safetensors", "model.layers.89.self_attn.k_proj.weight": "model-00014-of-00016.safetensors", "model.layers.89.self_attn.o_proj.weight": "model-00014-of-00016.safetensors", "model.layers.89.self_attn.q_proj.weight": "model-00014-of-00016.safetensors", "model.layers.89.self_attn.v_proj.weight": "model-00014-of-00016.safetensors", "model.layers.90.input_layernorm.weight": "model-00014-of-00016.safetensors", "model.layers.90.mlp.down_proj.weight": "model-00014-of-00016.safetensors", "model.layers.90.mlp.gate_proj.weight": "model-00014-of-00016.safetensors", "model.layers.90.mlp.up_proj.weight": "model-00014-of-00016.safetensors", "model.layers.90.post_attention_layernorm.weight": "model-00014-of-00016.safetensors", "model.layers.90.self_attn.k_proj.weight": "model-00014-of-00016.safetensors", "model.layers.90.self_attn.o_proj.weight": "model-00014-of-00016.safetensors", "model.layers.90.self_attn.q_proj.weight": "model-00014-of-00016.safetensors", "model.layers.90.self_attn.v_proj.weight": "model-00014-of-00016.safetensors", "model.layers.91.input_layernorm.weight": "model-00014-of-00016.safetensors", "model.layers.91.mlp.down_proj.weight": "model-00014-of-00016.safetensors", "model.layers.91.mlp.gate_proj.weight": "model-00014-of-00016.safetensors", "model.layers.91.mlp.up_proj.weight": "model-00014-of-00016.safetensors", "model.layers.91.post_attention_layernorm.weight": "model-00014-of-00016.safetensors", "model.layers.91.self_attn.k_proj.weight": "model-00014-of-00016.safetensors", "model.layers.91.self_attn.o_proj.weight": "model-00014-of-00016.safetensors", "model.layers.91.self_attn.q_proj.weight": "model-00015-of-00016.safetensors", "model.layers.91.self_attn.v_proj.weight": "model-00015-of-00016.safetensors", "model.layers.92.input_layernorm.weight": "model-00015-of-00016.safetensors", "model.layers.92.mlp.down_proj.weight": "model-00015-of-00016.safetensors", "model.layers.92.mlp.gate_proj.weight": "model-00015-of-00016.safetensors", "model.layers.92.mlp.up_proj.weight": "model-00015-of-00016.safetensors", "model.layers.92.post_attention_layernorm.weight": "model-00015-of-00016.safetensors", "model.layers.92.self_attn.k_proj.weight": "model-00015-of-00016.safetensors", "model.layers.92.self_attn.o_proj.weight": "model-00015-of-00016.safetensors", "model.layers.92.self_attn.q_proj.weight": "model-00015-of-00016.safetensors", "model.layers.92.self_attn.v_proj.weight": "model-00015-of-00016.safetensors", "model.layers.93.input_layernorm.weight": "model-00015-of-00016.safetensors", "model.layers.93.mlp.down_proj.weight": "model-00015-of-00016.safetensors", "model.layers.93.mlp.gate_proj.weight": "model-00015-of-00016.safetensors", "model.layers.93.mlp.up_proj.weight": "model-00015-of-00016.safetensors", "model.layers.93.post_attention_layernorm.weight": "model-00015-of-00016.safetensors", "model.layers.93.self_attn.k_proj.weight": "model-00015-of-00016.safetensors", "model.layers.93.self_attn.o_proj.weight": "model-00015-of-00016.safetensors", "model.layers.93.self_attn.q_proj.weight": "model-00015-of-00016.safetensors", "model.layers.93.self_attn.v_proj.weight": "model-00015-of-00016.safetensors", "model.layers.5.input_layernorm.weight": "model-00015-of-00016.safetensors", "model.layers.5.mlp.down_proj.weight": "model-00015-of-00016.safetensors", "model.layers.5.mlp.gate_proj.weight": "model-00015-of-00016.safetensors", "model.layers.5.mlp.up_proj.weight": "model-00015-of-00016.safetensors", "model.layers.5.post_attention_layernorm.weight": "model-00015-of-00016.safetensors", "model.layers.5.self_attn.k_proj.weight": "model-00015-of-00016.safetensors", "model.layers.5.self_attn.o_proj.weight": "model-00015-of-00016.safetensors", "model.layers.5.self_attn.q_proj.weight": "model-00015-of-00016.safetensors", "model.layers.5.self_attn.v_proj.weight": "model-00015-of-00016.safetensors", "model.layers.94.input_layernorm.weight": "model-00015-of-00016.safetensors", "model.layers.94.mlp.down_proj.weight": "model-00015-of-00016.safetensors", "model.layers.94.mlp.gate_proj.weight": "model-00015-of-00016.safetensors", "model.layers.94.mlp.up_proj.weight": "model-00015-of-00016.safetensors", "model.layers.94.post_attention_layernorm.weight": "model-00015-of-00016.safetensors", "model.layers.94.self_attn.k_proj.weight": "model-00015-of-00016.safetensors", "model.layers.94.self_attn.o_proj.weight": "model-00015-of-00016.safetensors", "model.layers.94.self_attn.q_proj.weight": "model-00015-of-00016.safetensors", "model.layers.94.self_attn.v_proj.weight": "model-00015-of-00016.safetensors", "model.layers.95.input_layernorm.weight": "model-00015-of-00016.safetensors", "model.layers.95.mlp.down_proj.weight": "model-00015-of-00016.safetensors", "model.layers.95.mlp.gate_proj.weight": "model-00015-of-00016.safetensors", "model.layers.95.mlp.up_proj.weight": "model-00015-of-00016.safetensors", "model.layers.95.post_attention_layernorm.weight": "model-00015-of-00016.safetensors", "model.layers.95.self_attn.k_proj.weight": "model-00015-of-00016.safetensors", "model.layers.95.self_attn.o_proj.weight": "model-00015-of-00016.safetensors", "model.layers.95.self_attn.q_proj.weight": "model-00015-of-00016.safetensors", "model.layers.95.self_attn.v_proj.weight": "model-00015-of-00016.safetensors", "model.layers.96.input_layernorm.weight": "model-00015-of-00016.safetensors", "model.layers.96.mlp.down_proj.weight": "model-00015-of-00016.safetensors", "model.layers.96.mlp.gate_proj.weight": "model-00015-of-00016.safetensors", "model.layers.96.mlp.up_proj.weight": "model-00015-of-00016.safetensors", "model.layers.96.post_attention_layernorm.weight": "model-00015-of-00016.safetensors", "model.layers.96.self_attn.k_proj.weight": "model-00015-of-00016.safetensors", "model.layers.96.self_attn.o_proj.weight": "model-00015-of-00016.safetensors", "model.layers.96.self_attn.q_proj.weight": "model-00015-of-00016.safetensors", "model.layers.96.self_attn.v_proj.weight": "model-00015-of-00016.safetensors", "model.layers.97.input_layernorm.weight": "model-00015-of-00016.safetensors", "model.layers.97.mlp.down_proj.weight": "model-00015-of-00016.safetensors", "model.layers.97.mlp.gate_proj.weight": "model-00016-of-00016.safetensors", "model.layers.97.mlp.up_proj.weight": "model-00016-of-00016.safetensors", "model.layers.97.post_attention_layernorm.weight": "model-00016-of-00016.safetensors", "model.layers.97.self_attn.k_proj.weight": "model-00016-of-00016.safetensors", "model.layers.97.self_attn.o_proj.weight": "model-00016-of-00016.safetensors", "model.layers.97.self_attn.q_proj.weight": "model-00016-of-00016.safetensors", "model.layers.97.self_attn.v_proj.weight": "model-00016-of-00016.safetensors", "model.layers.98.input_layernorm.weight": "model-00016-of-00016.safetensors", "model.layers.98.mlp.down_proj.weight": "model-00016-of-00016.safetensors", "model.layers.98.mlp.gate_proj.weight": "model-00016-of-00016.safetensors", "model.layers.98.mlp.up_proj.weight": "model-00016-of-00016.safetensors", "model.layers.98.post_attention_layernorm.weight": "model-00016-of-00016.safetensors", "model.layers.98.self_attn.k_proj.weight": "model-00016-of-00016.safetensors", "model.layers.98.self_attn.o_proj.weight": "model-00016-of-00016.safetensors", "model.layers.98.self_attn.q_proj.weight": "model-00016-of-00016.safetensors", "model.layers.98.self_attn.v_proj.weight": "model-00016-of-00016.safetensors", "model.layers.6.input_layernorm.weight": "model-00016-of-00016.safetensors", "model.layers.6.mlp.down_proj.weight": "model-00016-of-00016.safetensors", "model.layers.6.mlp.gate_proj.weight": "model-00016-of-00016.safetensors", "model.layers.6.mlp.up_proj.weight": "model-00016-of-00016.safetensors", "model.layers.6.post_attention_layernorm.weight": "model-00016-of-00016.safetensors", "model.layers.6.self_attn.k_proj.weight": "model-00016-of-00016.safetensors", "model.layers.6.self_attn.o_proj.weight": "model-00016-of-00016.safetensors", "model.layers.6.self_attn.q_proj.weight": "model-00016-of-00016.safetensors", "model.layers.6.self_attn.v_proj.weight": "model-00016-of-00016.safetensors", "model.layers.7.input_layernorm.weight": "model-00016-of-00016.safetensors", "model.layers.7.mlp.down_proj.weight": "model-00016-of-00016.safetensors", "model.layers.7.mlp.gate_proj.weight": "model-00016-of-00016.safetensors", "model.layers.7.mlp.up_proj.weight": "model-00016-of-00016.safetensors", "model.layers.7.post_attention_layernorm.weight": "model-00016-of-00016.safetensors", "model.layers.7.self_attn.k_proj.weight": "model-00016-of-00016.safetensors", "model.layers.7.self_attn.o_proj.weight": "model-00016-of-00016.safetensors", "model.layers.7.self_attn.q_proj.weight": "model-00016-of-00016.safetensors", "model.layers.7.self_attn.v_proj.weight": "model-00016-of-00016.safetensors", "model.layers.8.input_layernorm.weight": "model-00016-of-00016.safetensors", "model.layers.8.mlp.down_proj.weight": "model-00016-of-00016.safetensors", "model.layers.8.mlp.gate_proj.weight": "model-00016-of-00016.safetensors", "model.layers.8.mlp.up_proj.weight": "model-00016-of-00016.safetensors", "model.layers.8.post_attention_layernorm.weight": "model-00016-of-00016.safetensors", "model.layers.8.self_attn.k_proj.weight": "model-00016-of-00016.safetensors", "model.layers.8.self_attn.o_proj.weight": "model-00016-of-00016.safetensors", "model.layers.8.self_attn.q_proj.weight": "model-00016-of-00016.safetensors", "model.layers.8.self_attn.v_proj.weight": "model-00016-of-00016.safetensors", "model.layers.9.input_layernorm.weight": "model-00016-of-00016.safetensors", "model.layers.9.mlp.down_proj.weight": "model-00016-of-00016.safetensors", "model.layers.9.mlp.gate_proj.weight": "model-00016-of-00016.safetensors", "model.layers.9.mlp.up_proj.weight": "model-00016-of-00016.safetensors", "model.layers.9.post_attention_layernorm.weight": "model-00016-of-00016.safetensors", "model.layers.9.self_attn.k_proj.weight": "model-00016-of-00016.safetensors", "model.layers.9.self_attn.o_proj.weight": "model-00016-of-00016.safetensors", "model.layers.9.self_attn.q_proj.weight": "model-00016-of-00016.safetensors", "model.layers.9.self_attn.v_proj.weight": "model-00016-of-00016.safetensors", "model.norm.weight": "model-00016-of-00016.safetensors"}}
special_tokens_map.json ADDED
@@ -0,0 +1,30 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": {
3
+ "content": "<s>",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "eos_token": {
10
+ "content": "</s>",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": {
17
+ "content": "[control_748]",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "unk_token": {
24
+ "content": "<unk>",
25
+ "lstrip": false,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ }
30
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer.model ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:59f95e28944c062244741268596badc900df86c7f5ded05088d2da22a7379e06
3
+ size 587583
tokenizer_config.json ADDED
The diff for this file is too large to render. See raw diff