reach-vb HF staff commited on
Commit
971fc9d
1 Parent(s): 256b15d

Upload folder using huggingface_hub

Browse files
merges.txt ADDED
The diff for this file is too large to render. See raw diff
 
mlc-chat-config.json ADDED
@@ -0,0 +1,79 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "version": "0.1.0",
3
+ "model_type": "llama",
4
+ "quantization": "q0f16",
5
+ "model_config": {
6
+ "hidden_size": 576,
7
+ "intermediate_size": 1536,
8
+ "num_attention_heads": 9,
9
+ "num_hidden_layers": 30,
10
+ "rms_norm_eps": 1e-05,
11
+ "vocab_size": 49152,
12
+ "tie_word_embeddings": true,
13
+ "position_embedding_base": 10000.0,
14
+ "rope_scaling": null,
15
+ "context_window_size": 2048,
16
+ "prefill_chunk_size": 2048,
17
+ "num_key_value_heads": 3,
18
+ "head_dim": 64,
19
+ "tensor_parallel_shards": 1,
20
+ "pipeline_parallel_stages": 1,
21
+ "max_batch_size": 80
22
+ },
23
+ "vocab_size": 49152,
24
+ "context_window_size": 2048,
25
+ "sliding_window_size": -1,
26
+ "prefill_chunk_size": 2048,
27
+ "attention_sink_size": -1,
28
+ "tensor_parallel_shards": 1,
29
+ "pipeline_parallel_stages": 1,
30
+ "temperature": 1.0,
31
+ "presence_penalty": 0.0,
32
+ "frequency_penalty": 0.0,
33
+ "repetition_penalty": 1.0,
34
+ "top_p": 1.0,
35
+ "tokenizer_files": [
36
+ "tokenizer.json",
37
+ "vocab.json",
38
+ "merges.txt",
39
+ "tokenizer_config.json"
40
+ ],
41
+ "tokenizer_info": {
42
+ "token_postproc_method": "byte_level",
43
+ "prepend_space_in_encode": false,
44
+ "strip_space_in_decode": false
45
+ },
46
+ "conv_template": {
47
+ "name": "chatml_nosystem",
48
+ "system_template": "{system_message}",
49
+ "system_message": "",
50
+ "system_prefix_token_ids": null,
51
+ "add_role_after_system_message": true,
52
+ "roles": {
53
+ "user": "<|im_start|>user",
54
+ "assistant": "<|im_start|>assistant"
55
+ },
56
+ "role_templates": {
57
+ "user": "{user_message}",
58
+ "assistant": "{assistant_message}",
59
+ "tool": "{tool_message}"
60
+ },
61
+ "messages": [],
62
+ "seps": [
63
+ "<|im_end|>\n"
64
+ ],
65
+ "role_content_sep": "\n",
66
+ "role_empty_sep": "\n",
67
+ "stop_str": [
68
+ "<|im_end|>"
69
+ ],
70
+ "stop_token_ids": [
71
+ 2
72
+ ],
73
+ "function_string": "",
74
+ "use_function_calling": false
75
+ },
76
+ "pad_token_id": 2,
77
+ "bos_token_id": 1,
78
+ "eos_token_id": 2
79
+ }
ndarray-cache.json ADDED
@@ -0,0 +1,2014 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "metadata": {
3
+ "ParamSize": 182,
4
+ "ParamBytes": 269030016.0,
5
+ "BitsPerParam": 16.0
6
+ },
7
+ "records": [
8
+ {
9
+ "dataPath": "params_shard_0.bin",
10
+ "format": "raw-shard",
11
+ "nbytes": 56623104,
12
+ "records": [
13
+ {
14
+ "name": "model.embed_tokens.weight",
15
+ "shape": [
16
+ 49152,
17
+ 576
18
+ ],
19
+ "dtype": "float16",
20
+ "format": "f32-to-bf16",
21
+ "nbytes": 56623104,
22
+ "byteOffset": 0
23
+ }
24
+ ],
25
+ "md5sum": "70e35db9b4b600ec3240c2ffe187ceb3"
26
+ },
27
+ {
28
+ "dataPath": "params_shard_1.bin",
29
+ "format": "raw-shard",
30
+ "nbytes": 30091392,
31
+ "records": [
32
+ {
33
+ "name": "model.layers.0.input_layernorm.weight",
34
+ "shape": [
35
+ 576
36
+ ],
37
+ "dtype": "float16",
38
+ "format": "f32-to-bf16",
39
+ "nbytes": 1152,
40
+ "byteOffset": 0
41
+ },
42
+ {
43
+ "name": "model.layers.0.mlp.down_proj.weight",
44
+ "shape": [
45
+ 576,
46
+ 1536
47
+ ],
48
+ "dtype": "float16",
49
+ "format": "f32-to-bf16",
50
+ "nbytes": 1769472,
51
+ "byteOffset": 1152
52
+ },
53
+ {
54
+ "name": "model.layers.0.mlp.gate_up_proj.weight",
55
+ "shape": [
56
+ 3072,
57
+ 576
58
+ ],
59
+ "dtype": "float16",
60
+ "format": "f32-to-bf16",
61
+ "nbytes": 3538944,
62
+ "byteOffset": 1770624
63
+ },
64
+ {
65
+ "name": "model.layers.0.post_attention_layernorm.weight",
66
+ "shape": [
67
+ 576
68
+ ],
69
+ "dtype": "float16",
70
+ "format": "f32-to-bf16",
71
+ "nbytes": 1152,
72
+ "byteOffset": 5309568
73
+ },
74
+ {
75
+ "name": "model.layers.0.self_attn.qkv_proj.weight",
76
+ "shape": [
77
+ 960,
78
+ 576
79
+ ],
80
+ "dtype": "float16",
81
+ "format": "f32-to-bf16",
82
+ "nbytes": 1105920,
83
+ "byteOffset": 5310720
84
+ },
85
+ {
86
+ "name": "model.layers.0.self_attn.o_proj.weight",
87
+ "shape": [
88
+ 576,
89
+ 576
90
+ ],
91
+ "dtype": "float16",
92
+ "format": "f32-to-bf16",
93
+ "nbytes": 663552,
94
+ "byteOffset": 6416640
95
+ },
96
+ {
97
+ "name": "model.layers.1.input_layernorm.weight",
98
+ "shape": [
99
+ 576
100
+ ],
101
+ "dtype": "float16",
102
+ "format": "f32-to-bf16",
103
+ "nbytes": 1152,
104
+ "byteOffset": 7080192
105
+ },
106
+ {
107
+ "name": "model.layers.1.mlp.down_proj.weight",
108
+ "shape": [
109
+ 576,
110
+ 1536
111
+ ],
112
+ "dtype": "float16",
113
+ "format": "f32-to-bf16",
114
+ "nbytes": 1769472,
115
+ "byteOffset": 7081344
116
+ },
117
+ {
118
+ "name": "model.layers.1.mlp.gate_up_proj.weight",
119
+ "shape": [
120
+ 3072,
121
+ 576
122
+ ],
123
+ "dtype": "float16",
124
+ "format": "f32-to-bf16",
125
+ "nbytes": 3538944,
126
+ "byteOffset": 8850816
127
+ },
128
+ {
129
+ "name": "model.layers.1.post_attention_layernorm.weight",
130
+ "shape": [
131
+ 576
132
+ ],
133
+ "dtype": "float16",
134
+ "format": "f32-to-bf16",
135
+ "nbytes": 1152,
136
+ "byteOffset": 12389760
137
+ },
138
+ {
139
+ "name": "model.layers.1.self_attn.qkv_proj.weight",
140
+ "shape": [
141
+ 960,
142
+ 576
143
+ ],
144
+ "dtype": "float16",
145
+ "format": "f32-to-bf16",
146
+ "nbytes": 1105920,
147
+ "byteOffset": 12390912
148
+ },
149
+ {
150
+ "name": "model.layers.1.self_attn.o_proj.weight",
151
+ "shape": [
152
+ 576,
153
+ 576
154
+ ],
155
+ "dtype": "float16",
156
+ "format": "f32-to-bf16",
157
+ "nbytes": 663552,
158
+ "byteOffset": 13496832
159
+ },
160
+ {
161
+ "name": "model.layers.10.input_layernorm.weight",
162
+ "shape": [
163
+ 576
164
+ ],
165
+ "dtype": "float16",
166
+ "format": "f32-to-bf16",
167
+ "nbytes": 1152,
168
+ "byteOffset": 14160384
169
+ },
170
+ {
171
+ "name": "model.layers.10.mlp.down_proj.weight",
172
+ "shape": [
173
+ 576,
174
+ 1536
175
+ ],
176
+ "dtype": "float16",
177
+ "format": "f32-to-bf16",
178
+ "nbytes": 1769472,
179
+ "byteOffset": 14161536
180
+ },
181
+ {
182
+ "name": "model.layers.10.mlp.gate_up_proj.weight",
183
+ "shape": [
184
+ 3072,
185
+ 576
186
+ ],
187
+ "dtype": "float16",
188
+ "format": "f32-to-bf16",
189
+ "nbytes": 3538944,
190
+ "byteOffset": 15931008
191
+ },
192
+ {
193
+ "name": "model.layers.10.post_attention_layernorm.weight",
194
+ "shape": [
195
+ 576
196
+ ],
197
+ "dtype": "float16",
198
+ "format": "f32-to-bf16",
199
+ "nbytes": 1152,
200
+ "byteOffset": 19469952
201
+ },
202
+ {
203
+ "name": "model.layers.10.self_attn.qkv_proj.weight",
204
+ "shape": [
205
+ 960,
206
+ 576
207
+ ],
208
+ "dtype": "float16",
209
+ "format": "f32-to-bf16",
210
+ "nbytes": 1105920,
211
+ "byteOffset": 19471104
212
+ },
213
+ {
214
+ "name": "model.layers.10.self_attn.o_proj.weight",
215
+ "shape": [
216
+ 576,
217
+ 576
218
+ ],
219
+ "dtype": "float16",
220
+ "format": "f32-to-bf16",
221
+ "nbytes": 663552,
222
+ "byteOffset": 20577024
223
+ },
224
+ {
225
+ "name": "model.layers.11.input_layernorm.weight",
226
+ "shape": [
227
+ 576
228
+ ],
229
+ "dtype": "float16",
230
+ "format": "f32-to-bf16",
231
+ "nbytes": 1152,
232
+ "byteOffset": 21240576
233
+ },
234
+ {
235
+ "name": "model.layers.11.mlp.down_proj.weight",
236
+ "shape": [
237
+ 576,
238
+ 1536
239
+ ],
240
+ "dtype": "float16",
241
+ "format": "f32-to-bf16",
242
+ "nbytes": 1769472,
243
+ "byteOffset": 21241728
244
+ },
245
+ {
246
+ "name": "model.layers.11.mlp.gate_up_proj.weight",
247
+ "shape": [
248
+ 3072,
249
+ 576
250
+ ],
251
+ "dtype": "float16",
252
+ "format": "f32-to-bf16",
253
+ "nbytes": 3538944,
254
+ "byteOffset": 23011200
255
+ },
256
+ {
257
+ "name": "model.layers.11.post_attention_layernorm.weight",
258
+ "shape": [
259
+ 576
260
+ ],
261
+ "dtype": "float16",
262
+ "format": "f32-to-bf16",
263
+ "nbytes": 1152,
264
+ "byteOffset": 26550144
265
+ },
266
+ {
267
+ "name": "model.layers.11.self_attn.qkv_proj.weight",
268
+ "shape": [
269
+ 960,
270
+ 576
271
+ ],
272
+ "dtype": "float16",
273
+ "format": "f32-to-bf16",
274
+ "nbytes": 1105920,
275
+ "byteOffset": 26551296
276
+ },
277
+ {
278
+ "name": "model.layers.11.self_attn.o_proj.weight",
279
+ "shape": [
280
+ 576,
281
+ 576
282
+ ],
283
+ "dtype": "float16",
284
+ "format": "f32-to-bf16",
285
+ "nbytes": 663552,
286
+ "byteOffset": 27657216
287
+ },
288
+ {
289
+ "name": "model.layers.12.input_layernorm.weight",
290
+ "shape": [
291
+ 576
292
+ ],
293
+ "dtype": "float16",
294
+ "format": "f32-to-bf16",
295
+ "nbytes": 1152,
296
+ "byteOffset": 28320768
297
+ },
298
+ {
299
+ "name": "model.layers.12.mlp.down_proj.weight",
300
+ "shape": [
301
+ 576,
302
+ 1536
303
+ ],
304
+ "dtype": "float16",
305
+ "format": "f32-to-bf16",
306
+ "nbytes": 1769472,
307
+ "byteOffset": 28321920
308
+ }
309
+ ],
310
+ "md5sum": "b8efa7693b23b428ec92bca49894845e"
311
+ },
312
+ {
313
+ "dataPath": "params_shard_2.bin",
314
+ "format": "raw-shard",
315
+ "nbytes": 32966784,
316
+ "records": [
317
+ {
318
+ "name": "model.layers.12.mlp.gate_up_proj.weight",
319
+ "shape": [
320
+ 3072,
321
+ 576
322
+ ],
323
+ "dtype": "float16",
324
+ "format": "f32-to-bf16",
325
+ "nbytes": 3538944,
326
+ "byteOffset": 0
327
+ },
328
+ {
329
+ "name": "model.layers.12.post_attention_layernorm.weight",
330
+ "shape": [
331
+ 576
332
+ ],
333
+ "dtype": "float16",
334
+ "format": "f32-to-bf16",
335
+ "nbytes": 1152,
336
+ "byteOffset": 3538944
337
+ },
338
+ {
339
+ "name": "model.layers.12.self_attn.qkv_proj.weight",
340
+ "shape": [
341
+ 960,
342
+ 576
343
+ ],
344
+ "dtype": "float16",
345
+ "format": "f32-to-bf16",
346
+ "nbytes": 1105920,
347
+ "byteOffset": 3540096
348
+ },
349
+ {
350
+ "name": "model.layers.12.self_attn.o_proj.weight",
351
+ "shape": [
352
+ 576,
353
+ 576
354
+ ],
355
+ "dtype": "float16",
356
+ "format": "f32-to-bf16",
357
+ "nbytes": 663552,
358
+ "byteOffset": 4646016
359
+ },
360
+ {
361
+ "name": "model.layers.13.input_layernorm.weight",
362
+ "shape": [
363
+ 576
364
+ ],
365
+ "dtype": "float16",
366
+ "format": "f32-to-bf16",
367
+ "nbytes": 1152,
368
+ "byteOffset": 5309568
369
+ },
370
+ {
371
+ "name": "model.layers.13.mlp.down_proj.weight",
372
+ "shape": [
373
+ 576,
374
+ 1536
375
+ ],
376
+ "dtype": "float16",
377
+ "format": "f32-to-bf16",
378
+ "nbytes": 1769472,
379
+ "byteOffset": 5310720
380
+ },
381
+ {
382
+ "name": "model.layers.13.mlp.gate_up_proj.weight",
383
+ "shape": [
384
+ 3072,
385
+ 576
386
+ ],
387
+ "dtype": "float16",
388
+ "format": "f32-to-bf16",
389
+ "nbytes": 3538944,
390
+ "byteOffset": 7080192
391
+ },
392
+ {
393
+ "name": "model.layers.13.post_attention_layernorm.weight",
394
+ "shape": [
395
+ 576
396
+ ],
397
+ "dtype": "float16",
398
+ "format": "f32-to-bf16",
399
+ "nbytes": 1152,
400
+ "byteOffset": 10619136
401
+ },
402
+ {
403
+ "name": "model.layers.13.self_attn.qkv_proj.weight",
404
+ "shape": [
405
+ 960,
406
+ 576
407
+ ],
408
+ "dtype": "float16",
409
+ "format": "f32-to-bf16",
410
+ "nbytes": 1105920,
411
+ "byteOffset": 10620288
412
+ },
413
+ {
414
+ "name": "model.layers.13.self_attn.o_proj.weight",
415
+ "shape": [
416
+ 576,
417
+ 576
418
+ ],
419
+ "dtype": "float16",
420
+ "format": "f32-to-bf16",
421
+ "nbytes": 663552,
422
+ "byteOffset": 11726208
423
+ },
424
+ {
425
+ "name": "model.layers.14.input_layernorm.weight",
426
+ "shape": [
427
+ 576
428
+ ],
429
+ "dtype": "float16",
430
+ "format": "f32-to-bf16",
431
+ "nbytes": 1152,
432
+ "byteOffset": 12389760
433
+ },
434
+ {
435
+ "name": "model.layers.14.mlp.down_proj.weight",
436
+ "shape": [
437
+ 576,
438
+ 1536
439
+ ],
440
+ "dtype": "float16",
441
+ "format": "f32-to-bf16",
442
+ "nbytes": 1769472,
443
+ "byteOffset": 12390912
444
+ },
445
+ {
446
+ "name": "model.layers.14.mlp.gate_up_proj.weight",
447
+ "shape": [
448
+ 3072,
449
+ 576
450
+ ],
451
+ "dtype": "float16",
452
+ "format": "f32-to-bf16",
453
+ "nbytes": 3538944,
454
+ "byteOffset": 14160384
455
+ },
456
+ {
457
+ "name": "model.layers.14.post_attention_layernorm.weight",
458
+ "shape": [
459
+ 576
460
+ ],
461
+ "dtype": "float16",
462
+ "format": "f32-to-bf16",
463
+ "nbytes": 1152,
464
+ "byteOffset": 17699328
465
+ },
466
+ {
467
+ "name": "model.layers.14.self_attn.qkv_proj.weight",
468
+ "shape": [
469
+ 960,
470
+ 576
471
+ ],
472
+ "dtype": "float16",
473
+ "format": "f32-to-bf16",
474
+ "nbytes": 1105920,
475
+ "byteOffset": 17700480
476
+ },
477
+ {
478
+ "name": "model.layers.14.self_attn.o_proj.weight",
479
+ "shape": [
480
+ 576,
481
+ 576
482
+ ],
483
+ "dtype": "float16",
484
+ "format": "f32-to-bf16",
485
+ "nbytes": 663552,
486
+ "byteOffset": 18806400
487
+ },
488
+ {
489
+ "name": "model.layers.15.input_layernorm.weight",
490
+ "shape": [
491
+ 576
492
+ ],
493
+ "dtype": "float16",
494
+ "format": "f32-to-bf16",
495
+ "nbytes": 1152,
496
+ "byteOffset": 19469952
497
+ },
498
+ {
499
+ "name": "model.layers.15.mlp.down_proj.weight",
500
+ "shape": [
501
+ 576,
502
+ 1536
503
+ ],
504
+ "dtype": "float16",
505
+ "format": "f32-to-bf16",
506
+ "nbytes": 1769472,
507
+ "byteOffset": 19471104
508
+ },
509
+ {
510
+ "name": "model.layers.15.mlp.gate_up_proj.weight",
511
+ "shape": [
512
+ 3072,
513
+ 576
514
+ ],
515
+ "dtype": "float16",
516
+ "format": "f32-to-bf16",
517
+ "nbytes": 3538944,
518
+ "byteOffset": 21240576
519
+ },
520
+ {
521
+ "name": "model.layers.15.post_attention_layernorm.weight",
522
+ "shape": [
523
+ 576
524
+ ],
525
+ "dtype": "float16",
526
+ "format": "f32-to-bf16",
527
+ "nbytes": 1152,
528
+ "byteOffset": 24779520
529
+ },
530
+ {
531
+ "name": "model.layers.15.self_attn.qkv_proj.weight",
532
+ "shape": [
533
+ 960,
534
+ 576
535
+ ],
536
+ "dtype": "float16",
537
+ "format": "f32-to-bf16",
538
+ "nbytes": 1105920,
539
+ "byteOffset": 24780672
540
+ },
541
+ {
542
+ "name": "model.layers.15.self_attn.o_proj.weight",
543
+ "shape": [
544
+ 576,
545
+ 576
546
+ ],
547
+ "dtype": "float16",
548
+ "format": "f32-to-bf16",
549
+ "nbytes": 663552,
550
+ "byteOffset": 25886592
551
+ },
552
+ {
553
+ "name": "model.layers.16.input_layernorm.weight",
554
+ "shape": [
555
+ 576
556
+ ],
557
+ "dtype": "float16",
558
+ "format": "f32-to-bf16",
559
+ "nbytes": 1152,
560
+ "byteOffset": 26550144
561
+ },
562
+ {
563
+ "name": "model.layers.16.mlp.down_proj.weight",
564
+ "shape": [
565
+ 576,
566
+ 1536
567
+ ],
568
+ "dtype": "float16",
569
+ "format": "f32-to-bf16",
570
+ "nbytes": 1769472,
571
+ "byteOffset": 26551296
572
+ },
573
+ {
574
+ "name": "model.layers.16.mlp.gate_up_proj.weight",
575
+ "shape": [
576
+ 3072,
577
+ 576
578
+ ],
579
+ "dtype": "float16",
580
+ "format": "f32-to-bf16",
581
+ "nbytes": 3538944,
582
+ "byteOffset": 28320768
583
+ },
584
+ {
585
+ "name": "model.layers.16.post_attention_layernorm.weight",
586
+ "shape": [
587
+ 576
588
+ ],
589
+ "dtype": "float16",
590
+ "format": "f32-to-bf16",
591
+ "nbytes": 1152,
592
+ "byteOffset": 31859712
593
+ },
594
+ {
595
+ "name": "model.layers.16.self_attn.qkv_proj.weight",
596
+ "shape": [
597
+ 960,
598
+ 576
599
+ ],
600
+ "dtype": "float16",
601
+ "format": "f32-to-bf16",
602
+ "nbytes": 1105920,
603
+ "byteOffset": 31860864
604
+ }
605
+ ],
606
+ "md5sum": "0d2f7b6ab9ec8e934d6d07828f24ced2"
607
+ },
608
+ {
609
+ "dataPath": "params_shard_3.bin",
610
+ "format": "raw-shard",
611
+ "nbytes": 30754944,
612
+ "records": [
613
+ {
614
+ "name": "model.layers.16.self_attn.o_proj.weight",
615
+ "shape": [
616
+ 576,
617
+ 576
618
+ ],
619
+ "dtype": "float16",
620
+ "format": "f32-to-bf16",
621
+ "nbytes": 663552,
622
+ "byteOffset": 0
623
+ },
624
+ {
625
+ "name": "model.layers.17.input_layernorm.weight",
626
+ "shape": [
627
+ 576
628
+ ],
629
+ "dtype": "float16",
630
+ "format": "f32-to-bf16",
631
+ "nbytes": 1152,
632
+ "byteOffset": 663552
633
+ },
634
+ {
635
+ "name": "model.layers.17.mlp.down_proj.weight",
636
+ "shape": [
637
+ 576,
638
+ 1536
639
+ ],
640
+ "dtype": "float16",
641
+ "format": "f32-to-bf16",
642
+ "nbytes": 1769472,
643
+ "byteOffset": 664704
644
+ },
645
+ {
646
+ "name": "model.layers.17.mlp.gate_up_proj.weight",
647
+ "shape": [
648
+ 3072,
649
+ 576
650
+ ],
651
+ "dtype": "float16",
652
+ "format": "f32-to-bf16",
653
+ "nbytes": 3538944,
654
+ "byteOffset": 2434176
655
+ },
656
+ {
657
+ "name": "model.layers.17.post_attention_layernorm.weight",
658
+ "shape": [
659
+ 576
660
+ ],
661
+ "dtype": "float16",
662
+ "format": "f32-to-bf16",
663
+ "nbytes": 1152,
664
+ "byteOffset": 5973120
665
+ },
666
+ {
667
+ "name": "model.layers.17.self_attn.qkv_proj.weight",
668
+ "shape": [
669
+ 960,
670
+ 576
671
+ ],
672
+ "dtype": "float16",
673
+ "format": "f32-to-bf16",
674
+ "nbytes": 1105920,
675
+ "byteOffset": 5974272
676
+ },
677
+ {
678
+ "name": "model.layers.17.self_attn.o_proj.weight",
679
+ "shape": [
680
+ 576,
681
+ 576
682
+ ],
683
+ "dtype": "float16",
684
+ "format": "f32-to-bf16",
685
+ "nbytes": 663552,
686
+ "byteOffset": 7080192
687
+ },
688
+ {
689
+ "name": "model.layers.18.input_layernorm.weight",
690
+ "shape": [
691
+ 576
692
+ ],
693
+ "dtype": "float16",
694
+ "format": "f32-to-bf16",
695
+ "nbytes": 1152,
696
+ "byteOffset": 7743744
697
+ },
698
+ {
699
+ "name": "model.layers.18.mlp.down_proj.weight",
700
+ "shape": [
701
+ 576,
702
+ 1536
703
+ ],
704
+ "dtype": "float16",
705
+ "format": "f32-to-bf16",
706
+ "nbytes": 1769472,
707
+ "byteOffset": 7744896
708
+ },
709
+ {
710
+ "name": "model.layers.18.mlp.gate_up_proj.weight",
711
+ "shape": [
712
+ 3072,
713
+ 576
714
+ ],
715
+ "dtype": "float16",
716
+ "format": "f32-to-bf16",
717
+ "nbytes": 3538944,
718
+ "byteOffset": 9514368
719
+ },
720
+ {
721
+ "name": "model.layers.18.post_attention_layernorm.weight",
722
+ "shape": [
723
+ 576
724
+ ],
725
+ "dtype": "float16",
726
+ "format": "f32-to-bf16",
727
+ "nbytes": 1152,
728
+ "byteOffset": 13053312
729
+ },
730
+ {
731
+ "name": "model.layers.18.self_attn.qkv_proj.weight",
732
+ "shape": [
733
+ 960,
734
+ 576
735
+ ],
736
+ "dtype": "float16",
737
+ "format": "f32-to-bf16",
738
+ "nbytes": 1105920,
739
+ "byteOffset": 13054464
740
+ },
741
+ {
742
+ "name": "model.layers.18.self_attn.o_proj.weight",
743
+ "shape": [
744
+ 576,
745
+ 576
746
+ ],
747
+ "dtype": "float16",
748
+ "format": "f32-to-bf16",
749
+ "nbytes": 663552,
750
+ "byteOffset": 14160384
751
+ },
752
+ {
753
+ "name": "model.layers.19.input_layernorm.weight",
754
+ "shape": [
755
+ 576
756
+ ],
757
+ "dtype": "float16",
758
+ "format": "f32-to-bf16",
759
+ "nbytes": 1152,
760
+ "byteOffset": 14823936
761
+ },
762
+ {
763
+ "name": "model.layers.19.mlp.down_proj.weight",
764
+ "shape": [
765
+ 576,
766
+ 1536
767
+ ],
768
+ "dtype": "float16",
769
+ "format": "f32-to-bf16",
770
+ "nbytes": 1769472,
771
+ "byteOffset": 14825088
772
+ },
773
+ {
774
+ "name": "model.layers.19.mlp.gate_up_proj.weight",
775
+ "shape": [
776
+ 3072,
777
+ 576
778
+ ],
779
+ "dtype": "float16",
780
+ "format": "f32-to-bf16",
781
+ "nbytes": 3538944,
782
+ "byteOffset": 16594560
783
+ },
784
+ {
785
+ "name": "model.layers.19.post_attention_layernorm.weight",
786
+ "shape": [
787
+ 576
788
+ ],
789
+ "dtype": "float16",
790
+ "format": "f32-to-bf16",
791
+ "nbytes": 1152,
792
+ "byteOffset": 20133504
793
+ },
794
+ {
795
+ "name": "model.layers.19.self_attn.qkv_proj.weight",
796
+ "shape": [
797
+ 960,
798
+ 576
799
+ ],
800
+ "dtype": "float16",
801
+ "format": "f32-to-bf16",
802
+ "nbytes": 1105920,
803
+ "byteOffset": 20134656
804
+ },
805
+ {
806
+ "name": "model.layers.19.self_attn.o_proj.weight",
807
+ "shape": [
808
+ 576,
809
+ 576
810
+ ],
811
+ "dtype": "float16",
812
+ "format": "f32-to-bf16",
813
+ "nbytes": 663552,
814
+ "byteOffset": 21240576
815
+ },
816
+ {
817
+ "name": "model.layers.2.input_layernorm.weight",
818
+ "shape": [
819
+ 576
820
+ ],
821
+ "dtype": "float16",
822
+ "format": "f32-to-bf16",
823
+ "nbytes": 1152,
824
+ "byteOffset": 21904128
825
+ },
826
+ {
827
+ "name": "model.layers.2.mlp.down_proj.weight",
828
+ "shape": [
829
+ 576,
830
+ 1536
831
+ ],
832
+ "dtype": "float16",
833
+ "format": "f32-to-bf16",
834
+ "nbytes": 1769472,
835
+ "byteOffset": 21905280
836
+ },
837
+ {
838
+ "name": "model.layers.2.mlp.gate_up_proj.weight",
839
+ "shape": [
840
+ 3072,
841
+ 576
842
+ ],
843
+ "dtype": "float16",
844
+ "format": "f32-to-bf16",
845
+ "nbytes": 3538944,
846
+ "byteOffset": 23674752
847
+ },
848
+ {
849
+ "name": "model.layers.2.post_attention_layernorm.weight",
850
+ "shape": [
851
+ 576
852
+ ],
853
+ "dtype": "float16",
854
+ "format": "f32-to-bf16",
855
+ "nbytes": 1152,
856
+ "byteOffset": 27213696
857
+ },
858
+ {
859
+ "name": "model.layers.2.self_attn.qkv_proj.weight",
860
+ "shape": [
861
+ 960,
862
+ 576
863
+ ],
864
+ "dtype": "float16",
865
+ "format": "f32-to-bf16",
866
+ "nbytes": 1105920,
867
+ "byteOffset": 27214848
868
+ },
869
+ {
870
+ "name": "model.layers.2.self_attn.o_proj.weight",
871
+ "shape": [
872
+ 576,
873
+ 576
874
+ ],
875
+ "dtype": "float16",
876
+ "format": "f32-to-bf16",
877
+ "nbytes": 663552,
878
+ "byteOffset": 28320768
879
+ },
880
+ {
881
+ "name": "model.layers.20.input_layernorm.weight",
882
+ "shape": [
883
+ 576
884
+ ],
885
+ "dtype": "float16",
886
+ "format": "f32-to-bf16",
887
+ "nbytes": 1152,
888
+ "byteOffset": 28984320
889
+ },
890
+ {
891
+ "name": "model.layers.20.mlp.down_proj.weight",
892
+ "shape": [
893
+ 576,
894
+ 1536
895
+ ],
896
+ "dtype": "float16",
897
+ "format": "f32-to-bf16",
898
+ "nbytes": 1769472,
899
+ "byteOffset": 28985472
900
+ }
901
+ ],
902
+ "md5sum": "b6266df00b915d1f929c900454f314c6"
903
+ },
904
+ {
905
+ "dataPath": "params_shard_4.bin",
906
+ "format": "raw-shard",
907
+ "nbytes": 32966784,
908
+ "records": [
909
+ {
910
+ "name": "model.layers.20.mlp.gate_up_proj.weight",
911
+ "shape": [
912
+ 3072,
913
+ 576
914
+ ],
915
+ "dtype": "float16",
916
+ "format": "f32-to-bf16",
917
+ "nbytes": 3538944,
918
+ "byteOffset": 0
919
+ },
920
+ {
921
+ "name": "model.layers.20.post_attention_layernorm.weight",
922
+ "shape": [
923
+ 576
924
+ ],
925
+ "dtype": "float16",
926
+ "format": "f32-to-bf16",
927
+ "nbytes": 1152,
928
+ "byteOffset": 3538944
929
+ },
930
+ {
931
+ "name": "model.layers.20.self_attn.qkv_proj.weight",
932
+ "shape": [
933
+ 960,
934
+ 576
935
+ ],
936
+ "dtype": "float16",
937
+ "format": "f32-to-bf16",
938
+ "nbytes": 1105920,
939
+ "byteOffset": 3540096
940
+ },
941
+ {
942
+ "name": "model.layers.20.self_attn.o_proj.weight",
943
+ "shape": [
944
+ 576,
945
+ 576
946
+ ],
947
+ "dtype": "float16",
948
+ "format": "f32-to-bf16",
949
+ "nbytes": 663552,
950
+ "byteOffset": 4646016
951
+ },
952
+ {
953
+ "name": "model.layers.21.input_layernorm.weight",
954
+ "shape": [
955
+ 576
956
+ ],
957
+ "dtype": "float16",
958
+ "format": "f32-to-bf16",
959
+ "nbytes": 1152,
960
+ "byteOffset": 5309568
961
+ },
962
+ {
963
+ "name": "model.layers.21.mlp.down_proj.weight",
964
+ "shape": [
965
+ 576,
966
+ 1536
967
+ ],
968
+ "dtype": "float16",
969
+ "format": "f32-to-bf16",
970
+ "nbytes": 1769472,
971
+ "byteOffset": 5310720
972
+ },
973
+ {
974
+ "name": "model.layers.21.mlp.gate_up_proj.weight",
975
+ "shape": [
976
+ 3072,
977
+ 576
978
+ ],
979
+ "dtype": "float16",
980
+ "format": "f32-to-bf16",
981
+ "nbytes": 3538944,
982
+ "byteOffset": 7080192
983
+ },
984
+ {
985
+ "name": "model.layers.21.post_attention_layernorm.weight",
986
+ "shape": [
987
+ 576
988
+ ],
989
+ "dtype": "float16",
990
+ "format": "f32-to-bf16",
991
+ "nbytes": 1152,
992
+ "byteOffset": 10619136
993
+ },
994
+ {
995
+ "name": "model.layers.21.self_attn.qkv_proj.weight",
996
+ "shape": [
997
+ 960,
998
+ 576
999
+ ],
1000
+ "dtype": "float16",
1001
+ "format": "f32-to-bf16",
1002
+ "nbytes": 1105920,
1003
+ "byteOffset": 10620288
1004
+ },
1005
+ {
1006
+ "name": "model.layers.21.self_attn.o_proj.weight",
1007
+ "shape": [
1008
+ 576,
1009
+ 576
1010
+ ],
1011
+ "dtype": "float16",
1012
+ "format": "f32-to-bf16",
1013
+ "nbytes": 663552,
1014
+ "byteOffset": 11726208
1015
+ },
1016
+ {
1017
+ "name": "model.layers.22.input_layernorm.weight",
1018
+ "shape": [
1019
+ 576
1020
+ ],
1021
+ "dtype": "float16",
1022
+ "format": "f32-to-bf16",
1023
+ "nbytes": 1152,
1024
+ "byteOffset": 12389760
1025
+ },
1026
+ {
1027
+ "name": "model.layers.22.mlp.down_proj.weight",
1028
+ "shape": [
1029
+ 576,
1030
+ 1536
1031
+ ],
1032
+ "dtype": "float16",
1033
+ "format": "f32-to-bf16",
1034
+ "nbytes": 1769472,
1035
+ "byteOffset": 12390912
1036
+ },
1037
+ {
1038
+ "name": "model.layers.22.mlp.gate_up_proj.weight",
1039
+ "shape": [
1040
+ 3072,
1041
+ 576
1042
+ ],
1043
+ "dtype": "float16",
1044
+ "format": "f32-to-bf16",
1045
+ "nbytes": 3538944,
1046
+ "byteOffset": 14160384
1047
+ },
1048
+ {
1049
+ "name": "model.layers.22.post_attention_layernorm.weight",
1050
+ "shape": [
1051
+ 576
1052
+ ],
1053
+ "dtype": "float16",
1054
+ "format": "f32-to-bf16",
1055
+ "nbytes": 1152,
1056
+ "byteOffset": 17699328
1057
+ },
1058
+ {
1059
+ "name": "model.layers.22.self_attn.qkv_proj.weight",
1060
+ "shape": [
1061
+ 960,
1062
+ 576
1063
+ ],
1064
+ "dtype": "float16",
1065
+ "format": "f32-to-bf16",
1066
+ "nbytes": 1105920,
1067
+ "byteOffset": 17700480
1068
+ },
1069
+ {
1070
+ "name": "model.layers.22.self_attn.o_proj.weight",
1071
+ "shape": [
1072
+ 576,
1073
+ 576
1074
+ ],
1075
+ "dtype": "float16",
1076
+ "format": "f32-to-bf16",
1077
+ "nbytes": 663552,
1078
+ "byteOffset": 18806400
1079
+ },
1080
+ {
1081
+ "name": "model.layers.23.input_layernorm.weight",
1082
+ "shape": [
1083
+ 576
1084
+ ],
1085
+ "dtype": "float16",
1086
+ "format": "f32-to-bf16",
1087
+ "nbytes": 1152,
1088
+ "byteOffset": 19469952
1089
+ },
1090
+ {
1091
+ "name": "model.layers.23.mlp.down_proj.weight",
1092
+ "shape": [
1093
+ 576,
1094
+ 1536
1095
+ ],
1096
+ "dtype": "float16",
1097
+ "format": "f32-to-bf16",
1098
+ "nbytes": 1769472,
1099
+ "byteOffset": 19471104
1100
+ },
1101
+ {
1102
+ "name": "model.layers.23.mlp.gate_up_proj.weight",
1103
+ "shape": [
1104
+ 3072,
1105
+ 576
1106
+ ],
1107
+ "dtype": "float16",
1108
+ "format": "f32-to-bf16",
1109
+ "nbytes": 3538944,
1110
+ "byteOffset": 21240576
1111
+ },
1112
+ {
1113
+ "name": "model.layers.23.post_attention_layernorm.weight",
1114
+ "shape": [
1115
+ 576
1116
+ ],
1117
+ "dtype": "float16",
1118
+ "format": "f32-to-bf16",
1119
+ "nbytes": 1152,
1120
+ "byteOffset": 24779520
1121
+ },
1122
+ {
1123
+ "name": "model.layers.23.self_attn.qkv_proj.weight",
1124
+ "shape": [
1125
+ 960,
1126
+ 576
1127
+ ],
1128
+ "dtype": "float16",
1129
+ "format": "f32-to-bf16",
1130
+ "nbytes": 1105920,
1131
+ "byteOffset": 24780672
1132
+ },
1133
+ {
1134
+ "name": "model.layers.23.self_attn.o_proj.weight",
1135
+ "shape": [
1136
+ 576,
1137
+ 576
1138
+ ],
1139
+ "dtype": "float16",
1140
+ "format": "f32-to-bf16",
1141
+ "nbytes": 663552,
1142
+ "byteOffset": 25886592
1143
+ },
1144
+ {
1145
+ "name": "model.layers.24.input_layernorm.weight",
1146
+ "shape": [
1147
+ 576
1148
+ ],
1149
+ "dtype": "float16",
1150
+ "format": "f32-to-bf16",
1151
+ "nbytes": 1152,
1152
+ "byteOffset": 26550144
1153
+ },
1154
+ {
1155
+ "name": "model.layers.24.mlp.down_proj.weight",
1156
+ "shape": [
1157
+ 576,
1158
+ 1536
1159
+ ],
1160
+ "dtype": "float16",
1161
+ "format": "f32-to-bf16",
1162
+ "nbytes": 1769472,
1163
+ "byteOffset": 26551296
1164
+ },
1165
+ {
1166
+ "name": "model.layers.24.mlp.gate_up_proj.weight",
1167
+ "shape": [
1168
+ 3072,
1169
+ 576
1170
+ ],
1171
+ "dtype": "float16",
1172
+ "format": "f32-to-bf16",
1173
+ "nbytes": 3538944,
1174
+ "byteOffset": 28320768
1175
+ },
1176
+ {
1177
+ "name": "model.layers.24.post_attention_layernorm.weight",
1178
+ "shape": [
1179
+ 576
1180
+ ],
1181
+ "dtype": "float16",
1182
+ "format": "f32-to-bf16",
1183
+ "nbytes": 1152,
1184
+ "byteOffset": 31859712
1185
+ },
1186
+ {
1187
+ "name": "model.layers.24.self_attn.qkv_proj.weight",
1188
+ "shape": [
1189
+ 960,
1190
+ 576
1191
+ ],
1192
+ "dtype": "float16",
1193
+ "format": "f32-to-bf16",
1194
+ "nbytes": 1105920,
1195
+ "byteOffset": 31860864
1196
+ }
1197
+ ],
1198
+ "md5sum": "81d5f032a4bf4a189303caee4ef91ef7"
1199
+ },
1200
+ {
1201
+ "dataPath": "params_shard_5.bin",
1202
+ "format": "raw-shard",
1203
+ "nbytes": 30754944,
1204
+ "records": [
1205
+ {
1206
+ "name": "model.layers.24.self_attn.o_proj.weight",
1207
+ "shape": [
1208
+ 576,
1209
+ 576
1210
+ ],
1211
+ "dtype": "float16",
1212
+ "format": "f32-to-bf16",
1213
+ "nbytes": 663552,
1214
+ "byteOffset": 0
1215
+ },
1216
+ {
1217
+ "name": "model.layers.25.input_layernorm.weight",
1218
+ "shape": [
1219
+ 576
1220
+ ],
1221
+ "dtype": "float16",
1222
+ "format": "f32-to-bf16",
1223
+ "nbytes": 1152,
1224
+ "byteOffset": 663552
1225
+ },
1226
+ {
1227
+ "name": "model.layers.25.mlp.down_proj.weight",
1228
+ "shape": [
1229
+ 576,
1230
+ 1536
1231
+ ],
1232
+ "dtype": "float16",
1233
+ "format": "f32-to-bf16",
1234
+ "nbytes": 1769472,
1235
+ "byteOffset": 664704
1236
+ },
1237
+ {
1238
+ "name": "model.layers.25.mlp.gate_up_proj.weight",
1239
+ "shape": [
1240
+ 3072,
1241
+ 576
1242
+ ],
1243
+ "dtype": "float16",
1244
+ "format": "f32-to-bf16",
1245
+ "nbytes": 3538944,
1246
+ "byteOffset": 2434176
1247
+ },
1248
+ {
1249
+ "name": "model.layers.25.post_attention_layernorm.weight",
1250
+ "shape": [
1251
+ 576
1252
+ ],
1253
+ "dtype": "float16",
1254
+ "format": "f32-to-bf16",
1255
+ "nbytes": 1152,
1256
+ "byteOffset": 5973120
1257
+ },
1258
+ {
1259
+ "name": "model.layers.25.self_attn.qkv_proj.weight",
1260
+ "shape": [
1261
+ 960,
1262
+ 576
1263
+ ],
1264
+ "dtype": "float16",
1265
+ "format": "f32-to-bf16",
1266
+ "nbytes": 1105920,
1267
+ "byteOffset": 5974272
1268
+ },
1269
+ {
1270
+ "name": "model.layers.25.self_attn.o_proj.weight",
1271
+ "shape": [
1272
+ 576,
1273
+ 576
1274
+ ],
1275
+ "dtype": "float16",
1276
+ "format": "f32-to-bf16",
1277
+ "nbytes": 663552,
1278
+ "byteOffset": 7080192
1279
+ },
1280
+ {
1281
+ "name": "model.layers.26.input_layernorm.weight",
1282
+ "shape": [
1283
+ 576
1284
+ ],
1285
+ "dtype": "float16",
1286
+ "format": "f32-to-bf16",
1287
+ "nbytes": 1152,
1288
+ "byteOffset": 7743744
1289
+ },
1290
+ {
1291
+ "name": "model.layers.26.mlp.down_proj.weight",
1292
+ "shape": [
1293
+ 576,
1294
+ 1536
1295
+ ],
1296
+ "dtype": "float16",
1297
+ "format": "f32-to-bf16",
1298
+ "nbytes": 1769472,
1299
+ "byteOffset": 7744896
1300
+ },
1301
+ {
1302
+ "name": "model.layers.26.mlp.gate_up_proj.weight",
1303
+ "shape": [
1304
+ 3072,
1305
+ 576
1306
+ ],
1307
+ "dtype": "float16",
1308
+ "format": "f32-to-bf16",
1309
+ "nbytes": 3538944,
1310
+ "byteOffset": 9514368
1311
+ },
1312
+ {
1313
+ "name": "model.layers.26.post_attention_layernorm.weight",
1314
+ "shape": [
1315
+ 576
1316
+ ],
1317
+ "dtype": "float16",
1318
+ "format": "f32-to-bf16",
1319
+ "nbytes": 1152,
1320
+ "byteOffset": 13053312
1321
+ },
1322
+ {
1323
+ "name": "model.layers.26.self_attn.qkv_proj.weight",
1324
+ "shape": [
1325
+ 960,
1326
+ 576
1327
+ ],
1328
+ "dtype": "float16",
1329
+ "format": "f32-to-bf16",
1330
+ "nbytes": 1105920,
1331
+ "byteOffset": 13054464
1332
+ },
1333
+ {
1334
+ "name": "model.layers.26.self_attn.o_proj.weight",
1335
+ "shape": [
1336
+ 576,
1337
+ 576
1338
+ ],
1339
+ "dtype": "float16",
1340
+ "format": "f32-to-bf16",
1341
+ "nbytes": 663552,
1342
+ "byteOffset": 14160384
1343
+ },
1344
+ {
1345
+ "name": "model.layers.27.input_layernorm.weight",
1346
+ "shape": [
1347
+ 576
1348
+ ],
1349
+ "dtype": "float16",
1350
+ "format": "f32-to-bf16",
1351
+ "nbytes": 1152,
1352
+ "byteOffset": 14823936
1353
+ },
1354
+ {
1355
+ "name": "model.layers.27.mlp.down_proj.weight",
1356
+ "shape": [
1357
+ 576,
1358
+ 1536
1359
+ ],
1360
+ "dtype": "float16",
1361
+ "format": "f32-to-bf16",
1362
+ "nbytes": 1769472,
1363
+ "byteOffset": 14825088
1364
+ },
1365
+ {
1366
+ "name": "model.layers.27.mlp.gate_up_proj.weight",
1367
+ "shape": [
1368
+ 3072,
1369
+ 576
1370
+ ],
1371
+ "dtype": "float16",
1372
+ "format": "f32-to-bf16",
1373
+ "nbytes": 3538944,
1374
+ "byteOffset": 16594560
1375
+ },
1376
+ {
1377
+ "name": "model.layers.27.post_attention_layernorm.weight",
1378
+ "shape": [
1379
+ 576
1380
+ ],
1381
+ "dtype": "float16",
1382
+ "format": "f32-to-bf16",
1383
+ "nbytes": 1152,
1384
+ "byteOffset": 20133504
1385
+ },
1386
+ {
1387
+ "name": "model.layers.27.self_attn.qkv_proj.weight",
1388
+ "shape": [
1389
+ 960,
1390
+ 576
1391
+ ],
1392
+ "dtype": "float16",
1393
+ "format": "f32-to-bf16",
1394
+ "nbytes": 1105920,
1395
+ "byteOffset": 20134656
1396
+ },
1397
+ {
1398
+ "name": "model.layers.27.self_attn.o_proj.weight",
1399
+ "shape": [
1400
+ 576,
1401
+ 576
1402
+ ],
1403
+ "dtype": "float16",
1404
+ "format": "f32-to-bf16",
1405
+ "nbytes": 663552,
1406
+ "byteOffset": 21240576
1407
+ },
1408
+ {
1409
+ "name": "model.layers.28.input_layernorm.weight",
1410
+ "shape": [
1411
+ 576
1412
+ ],
1413
+ "dtype": "float16",
1414
+ "format": "f32-to-bf16",
1415
+ "nbytes": 1152,
1416
+ "byteOffset": 21904128
1417
+ },
1418
+ {
1419
+ "name": "model.layers.28.mlp.down_proj.weight",
1420
+ "shape": [
1421
+ 576,
1422
+ 1536
1423
+ ],
1424
+ "dtype": "float16",
1425
+ "format": "f32-to-bf16",
1426
+ "nbytes": 1769472,
1427
+ "byteOffset": 21905280
1428
+ },
1429
+ {
1430
+ "name": "model.layers.28.mlp.gate_up_proj.weight",
1431
+ "shape": [
1432
+ 3072,
1433
+ 576
1434
+ ],
1435
+ "dtype": "float16",
1436
+ "format": "f32-to-bf16",
1437
+ "nbytes": 3538944,
1438
+ "byteOffset": 23674752
1439
+ },
1440
+ {
1441
+ "name": "model.layers.28.post_attention_layernorm.weight",
1442
+ "shape": [
1443
+ 576
1444
+ ],
1445
+ "dtype": "float16",
1446
+ "format": "f32-to-bf16",
1447
+ "nbytes": 1152,
1448
+ "byteOffset": 27213696
1449
+ },
1450
+ {
1451
+ "name": "model.layers.28.self_attn.qkv_proj.weight",
1452
+ "shape": [
1453
+ 960,
1454
+ 576
1455
+ ],
1456
+ "dtype": "float16",
1457
+ "format": "f32-to-bf16",
1458
+ "nbytes": 1105920,
1459
+ "byteOffset": 27214848
1460
+ },
1461
+ {
1462
+ "name": "model.layers.28.self_attn.o_proj.weight",
1463
+ "shape": [
1464
+ 576,
1465
+ 576
1466
+ ],
1467
+ "dtype": "float16",
1468
+ "format": "f32-to-bf16",
1469
+ "nbytes": 663552,
1470
+ "byteOffset": 28320768
1471
+ },
1472
+ {
1473
+ "name": "model.layers.29.input_layernorm.weight",
1474
+ "shape": [
1475
+ 576
1476
+ ],
1477
+ "dtype": "float16",
1478
+ "format": "f32-to-bf16",
1479
+ "nbytes": 1152,
1480
+ "byteOffset": 28984320
1481
+ },
1482
+ {
1483
+ "name": "model.layers.29.mlp.down_proj.weight",
1484
+ "shape": [
1485
+ 576,
1486
+ 1536
1487
+ ],
1488
+ "dtype": "float16",
1489
+ "format": "f32-to-bf16",
1490
+ "nbytes": 1769472,
1491
+ "byteOffset": 28985472
1492
+ }
1493
+ ],
1494
+ "md5sum": "9788bb8205c8099c90aefddc9d1ed9fd"
1495
+ },
1496
+ {
1497
+ "dataPath": "params_shard_6.bin",
1498
+ "format": "raw-shard",
1499
+ "nbytes": 32966784,
1500
+ "records": [
1501
+ {
1502
+ "name": "model.layers.29.mlp.gate_up_proj.weight",
1503
+ "shape": [
1504
+ 3072,
1505
+ 576
1506
+ ],
1507
+ "dtype": "float16",
1508
+ "format": "f32-to-bf16",
1509
+ "nbytes": 3538944,
1510
+ "byteOffset": 0
1511
+ },
1512
+ {
1513
+ "name": "model.layers.29.post_attention_layernorm.weight",
1514
+ "shape": [
1515
+ 576
1516
+ ],
1517
+ "dtype": "float16",
1518
+ "format": "f32-to-bf16",
1519
+ "nbytes": 1152,
1520
+ "byteOffset": 3538944
1521
+ },
1522
+ {
1523
+ "name": "model.layers.29.self_attn.qkv_proj.weight",
1524
+ "shape": [
1525
+ 960,
1526
+ 576
1527
+ ],
1528
+ "dtype": "float16",
1529
+ "format": "f32-to-bf16",
1530
+ "nbytes": 1105920,
1531
+ "byteOffset": 3540096
1532
+ },
1533
+ {
1534
+ "name": "model.layers.29.self_attn.o_proj.weight",
1535
+ "shape": [
1536
+ 576,
1537
+ 576
1538
+ ],
1539
+ "dtype": "float16",
1540
+ "format": "f32-to-bf16",
1541
+ "nbytes": 663552,
1542
+ "byteOffset": 4646016
1543
+ },
1544
+ {
1545
+ "name": "model.layers.3.input_layernorm.weight",
1546
+ "shape": [
1547
+ 576
1548
+ ],
1549
+ "dtype": "float16",
1550
+ "format": "f32-to-bf16",
1551
+ "nbytes": 1152,
1552
+ "byteOffset": 5309568
1553
+ },
1554
+ {
1555
+ "name": "model.layers.3.mlp.down_proj.weight",
1556
+ "shape": [
1557
+ 576,
1558
+ 1536
1559
+ ],
1560
+ "dtype": "float16",
1561
+ "format": "f32-to-bf16",
1562
+ "nbytes": 1769472,
1563
+ "byteOffset": 5310720
1564
+ },
1565
+ {
1566
+ "name": "model.layers.3.mlp.gate_up_proj.weight",
1567
+ "shape": [
1568
+ 3072,
1569
+ 576
1570
+ ],
1571
+ "dtype": "float16",
1572
+ "format": "f32-to-bf16",
1573
+ "nbytes": 3538944,
1574
+ "byteOffset": 7080192
1575
+ },
1576
+ {
1577
+ "name": "model.layers.3.post_attention_layernorm.weight",
1578
+ "shape": [
1579
+ 576
1580
+ ],
1581
+ "dtype": "float16",
1582
+ "format": "f32-to-bf16",
1583
+ "nbytes": 1152,
1584
+ "byteOffset": 10619136
1585
+ },
1586
+ {
1587
+ "name": "model.layers.3.self_attn.qkv_proj.weight",
1588
+ "shape": [
1589
+ 960,
1590
+ 576
1591
+ ],
1592
+ "dtype": "float16",
1593
+ "format": "f32-to-bf16",
1594
+ "nbytes": 1105920,
1595
+ "byteOffset": 10620288
1596
+ },
1597
+ {
1598
+ "name": "model.layers.3.self_attn.o_proj.weight",
1599
+ "shape": [
1600
+ 576,
1601
+ 576
1602
+ ],
1603
+ "dtype": "float16",
1604
+ "format": "f32-to-bf16",
1605
+ "nbytes": 663552,
1606
+ "byteOffset": 11726208
1607
+ },
1608
+ {
1609
+ "name": "model.layers.4.input_layernorm.weight",
1610
+ "shape": [
1611
+ 576
1612
+ ],
1613
+ "dtype": "float16",
1614
+ "format": "f32-to-bf16",
1615
+ "nbytes": 1152,
1616
+ "byteOffset": 12389760
1617
+ },
1618
+ {
1619
+ "name": "model.layers.4.mlp.down_proj.weight",
1620
+ "shape": [
1621
+ 576,
1622
+ 1536
1623
+ ],
1624
+ "dtype": "float16",
1625
+ "format": "f32-to-bf16",
1626
+ "nbytes": 1769472,
1627
+ "byteOffset": 12390912
1628
+ },
1629
+ {
1630
+ "name": "model.layers.4.mlp.gate_up_proj.weight",
1631
+ "shape": [
1632
+ 3072,
1633
+ 576
1634
+ ],
1635
+ "dtype": "float16",
1636
+ "format": "f32-to-bf16",
1637
+ "nbytes": 3538944,
1638
+ "byteOffset": 14160384
1639
+ },
1640
+ {
1641
+ "name": "model.layers.4.post_attention_layernorm.weight",
1642
+ "shape": [
1643
+ 576
1644
+ ],
1645
+ "dtype": "float16",
1646
+ "format": "f32-to-bf16",
1647
+ "nbytes": 1152,
1648
+ "byteOffset": 17699328
1649
+ },
1650
+ {
1651
+ "name": "model.layers.4.self_attn.qkv_proj.weight",
1652
+ "shape": [
1653
+ 960,
1654
+ 576
1655
+ ],
1656
+ "dtype": "float16",
1657
+ "format": "f32-to-bf16",
1658
+ "nbytes": 1105920,
1659
+ "byteOffset": 17700480
1660
+ },
1661
+ {
1662
+ "name": "model.layers.4.self_attn.o_proj.weight",
1663
+ "shape": [
1664
+ 576,
1665
+ 576
1666
+ ],
1667
+ "dtype": "float16",
1668
+ "format": "f32-to-bf16",
1669
+ "nbytes": 663552,
1670
+ "byteOffset": 18806400
1671
+ },
1672
+ {
1673
+ "name": "model.layers.5.input_layernorm.weight",
1674
+ "shape": [
1675
+ 576
1676
+ ],
1677
+ "dtype": "float16",
1678
+ "format": "f32-to-bf16",
1679
+ "nbytes": 1152,
1680
+ "byteOffset": 19469952
1681
+ },
1682
+ {
1683
+ "name": "model.layers.5.mlp.down_proj.weight",
1684
+ "shape": [
1685
+ 576,
1686
+ 1536
1687
+ ],
1688
+ "dtype": "float16",
1689
+ "format": "f32-to-bf16",
1690
+ "nbytes": 1769472,
1691
+ "byteOffset": 19471104
1692
+ },
1693
+ {
1694
+ "name": "model.layers.5.mlp.gate_up_proj.weight",
1695
+ "shape": [
1696
+ 3072,
1697
+ 576
1698
+ ],
1699
+ "dtype": "float16",
1700
+ "format": "f32-to-bf16",
1701
+ "nbytes": 3538944,
1702
+ "byteOffset": 21240576
1703
+ },
1704
+ {
1705
+ "name": "model.layers.5.post_attention_layernorm.weight",
1706
+ "shape": [
1707
+ 576
1708
+ ],
1709
+ "dtype": "float16",
1710
+ "format": "f32-to-bf16",
1711
+ "nbytes": 1152,
1712
+ "byteOffset": 24779520
1713
+ },
1714
+ {
1715
+ "name": "model.layers.5.self_attn.qkv_proj.weight",
1716
+ "shape": [
1717
+ 960,
1718
+ 576
1719
+ ],
1720
+ "dtype": "float16",
1721
+ "format": "f32-to-bf16",
1722
+ "nbytes": 1105920,
1723
+ "byteOffset": 24780672
1724
+ },
1725
+ {
1726
+ "name": "model.layers.5.self_attn.o_proj.weight",
1727
+ "shape": [
1728
+ 576,
1729
+ 576
1730
+ ],
1731
+ "dtype": "float16",
1732
+ "format": "f32-to-bf16",
1733
+ "nbytes": 663552,
1734
+ "byteOffset": 25886592
1735
+ },
1736
+ {
1737
+ "name": "model.layers.6.input_layernorm.weight",
1738
+ "shape": [
1739
+ 576
1740
+ ],
1741
+ "dtype": "float16",
1742
+ "format": "f32-to-bf16",
1743
+ "nbytes": 1152,
1744
+ "byteOffset": 26550144
1745
+ },
1746
+ {
1747
+ "name": "model.layers.6.mlp.down_proj.weight",
1748
+ "shape": [
1749
+ 576,
1750
+ 1536
1751
+ ],
1752
+ "dtype": "float16",
1753
+ "format": "f32-to-bf16",
1754
+ "nbytes": 1769472,
1755
+ "byteOffset": 26551296
1756
+ },
1757
+ {
1758
+ "name": "model.layers.6.mlp.gate_up_proj.weight",
1759
+ "shape": [
1760
+ 3072,
1761
+ 576
1762
+ ],
1763
+ "dtype": "float16",
1764
+ "format": "f32-to-bf16",
1765
+ "nbytes": 3538944,
1766
+ "byteOffset": 28320768
1767
+ },
1768
+ {
1769
+ "name": "model.layers.6.post_attention_layernorm.weight",
1770
+ "shape": [
1771
+ 576
1772
+ ],
1773
+ "dtype": "float16",
1774
+ "format": "f32-to-bf16",
1775
+ "nbytes": 1152,
1776
+ "byteOffset": 31859712
1777
+ },
1778
+ {
1779
+ "name": "model.layers.6.self_attn.qkv_proj.weight",
1780
+ "shape": [
1781
+ 960,
1782
+ 576
1783
+ ],
1784
+ "dtype": "float16",
1785
+ "format": "f32-to-bf16",
1786
+ "nbytes": 1105920,
1787
+ "byteOffset": 31860864
1788
+ }
1789
+ ],
1790
+ "md5sum": "ae5d723eb20eabf7d1c9ca6aa66bbbaa"
1791
+ },
1792
+ {
1793
+ "dataPath": "params_shard_7.bin",
1794
+ "format": "raw-shard",
1795
+ "nbytes": 21905280,
1796
+ "records": [
1797
+ {
1798
+ "name": "model.layers.6.self_attn.o_proj.weight",
1799
+ "shape": [
1800
+ 576,
1801
+ 576
1802
+ ],
1803
+ "dtype": "float16",
1804
+ "format": "f32-to-bf16",
1805
+ "nbytes": 663552,
1806
+ "byteOffset": 0
1807
+ },
1808
+ {
1809
+ "name": "model.layers.7.input_layernorm.weight",
1810
+ "shape": [
1811
+ 576
1812
+ ],
1813
+ "dtype": "float16",
1814
+ "format": "f32-to-bf16",
1815
+ "nbytes": 1152,
1816
+ "byteOffset": 663552
1817
+ },
1818
+ {
1819
+ "name": "model.layers.7.mlp.down_proj.weight",
1820
+ "shape": [
1821
+ 576,
1822
+ 1536
1823
+ ],
1824
+ "dtype": "float16",
1825
+ "format": "f32-to-bf16",
1826
+ "nbytes": 1769472,
1827
+ "byteOffset": 664704
1828
+ },
1829
+ {
1830
+ "name": "model.layers.7.mlp.gate_up_proj.weight",
1831
+ "shape": [
1832
+ 3072,
1833
+ 576
1834
+ ],
1835
+ "dtype": "float16",
1836
+ "format": "f32-to-bf16",
1837
+ "nbytes": 3538944,
1838
+ "byteOffset": 2434176
1839
+ },
1840
+ {
1841
+ "name": "model.layers.7.post_attention_layernorm.weight",
1842
+ "shape": [
1843
+ 576
1844
+ ],
1845
+ "dtype": "float16",
1846
+ "format": "f32-to-bf16",
1847
+ "nbytes": 1152,
1848
+ "byteOffset": 5973120
1849
+ },
1850
+ {
1851
+ "name": "model.layers.7.self_attn.qkv_proj.weight",
1852
+ "shape": [
1853
+ 960,
1854
+ 576
1855
+ ],
1856
+ "dtype": "float16",
1857
+ "format": "f32-to-bf16",
1858
+ "nbytes": 1105920,
1859
+ "byteOffset": 5974272
1860
+ },
1861
+ {
1862
+ "name": "model.layers.7.self_attn.o_proj.weight",
1863
+ "shape": [
1864
+ 576,
1865
+ 576
1866
+ ],
1867
+ "dtype": "float16",
1868
+ "format": "f32-to-bf16",
1869
+ "nbytes": 663552,
1870
+ "byteOffset": 7080192
1871
+ },
1872
+ {
1873
+ "name": "model.layers.8.input_layernorm.weight",
1874
+ "shape": [
1875
+ 576
1876
+ ],
1877
+ "dtype": "float16",
1878
+ "format": "f32-to-bf16",
1879
+ "nbytes": 1152,
1880
+ "byteOffset": 7743744
1881
+ },
1882
+ {
1883
+ "name": "model.layers.8.mlp.down_proj.weight",
1884
+ "shape": [
1885
+ 576,
1886
+ 1536
1887
+ ],
1888
+ "dtype": "float16",
1889
+ "format": "f32-to-bf16",
1890
+ "nbytes": 1769472,
1891
+ "byteOffset": 7744896
1892
+ },
1893
+ {
1894
+ "name": "model.layers.8.mlp.gate_up_proj.weight",
1895
+ "shape": [
1896
+ 3072,
1897
+ 576
1898
+ ],
1899
+ "dtype": "float16",
1900
+ "format": "f32-to-bf16",
1901
+ "nbytes": 3538944,
1902
+ "byteOffset": 9514368
1903
+ },
1904
+ {
1905
+ "name": "model.layers.8.post_attention_layernorm.weight",
1906
+ "shape": [
1907
+ 576
1908
+ ],
1909
+ "dtype": "float16",
1910
+ "format": "f32-to-bf16",
1911
+ "nbytes": 1152,
1912
+ "byteOffset": 13053312
1913
+ },
1914
+ {
1915
+ "name": "model.layers.8.self_attn.qkv_proj.weight",
1916
+ "shape": [
1917
+ 960,
1918
+ 576
1919
+ ],
1920
+ "dtype": "float16",
1921
+ "format": "f32-to-bf16",
1922
+ "nbytes": 1105920,
1923
+ "byteOffset": 13054464
1924
+ },
1925
+ {
1926
+ "name": "model.layers.8.self_attn.o_proj.weight",
1927
+ "shape": [
1928
+ 576,
1929
+ 576
1930
+ ],
1931
+ "dtype": "float16",
1932
+ "format": "f32-to-bf16",
1933
+ "nbytes": 663552,
1934
+ "byteOffset": 14160384
1935
+ },
1936
+ {
1937
+ "name": "model.layers.9.input_layernorm.weight",
1938
+ "shape": [
1939
+ 576
1940
+ ],
1941
+ "dtype": "float16",
1942
+ "format": "f32-to-bf16",
1943
+ "nbytes": 1152,
1944
+ "byteOffset": 14823936
1945
+ },
1946
+ {
1947
+ "name": "model.layers.9.mlp.down_proj.weight",
1948
+ "shape": [
1949
+ 576,
1950
+ 1536
1951
+ ],
1952
+ "dtype": "float16",
1953
+ "format": "f32-to-bf16",
1954
+ "nbytes": 1769472,
1955
+ "byteOffset": 14825088
1956
+ },
1957
+ {
1958
+ "name": "model.layers.9.mlp.gate_up_proj.weight",
1959
+ "shape": [
1960
+ 3072,
1961
+ 576
1962
+ ],
1963
+ "dtype": "float16",
1964
+ "format": "f32-to-bf16",
1965
+ "nbytes": 3538944,
1966
+ "byteOffset": 16594560
1967
+ },
1968
+ {
1969
+ "name": "model.layers.9.post_attention_layernorm.weight",
1970
+ "shape": [
1971
+ 576
1972
+ ],
1973
+ "dtype": "float16",
1974
+ "format": "f32-to-bf16",
1975
+ "nbytes": 1152,
1976
+ "byteOffset": 20133504
1977
+ },
1978
+ {
1979
+ "name": "model.layers.9.self_attn.qkv_proj.weight",
1980
+ "shape": [
1981
+ 960,
1982
+ 576
1983
+ ],
1984
+ "dtype": "float16",
1985
+ "format": "f32-to-bf16",
1986
+ "nbytes": 1105920,
1987
+ "byteOffset": 20134656
1988
+ },
1989
+ {
1990
+ "name": "model.layers.9.self_attn.o_proj.weight",
1991
+ "shape": [
1992
+ 576,
1993
+ 576
1994
+ ],
1995
+ "dtype": "float16",
1996
+ "format": "f32-to-bf16",
1997
+ "nbytes": 663552,
1998
+ "byteOffset": 21240576
1999
+ },
2000
+ {
2001
+ "name": "model.norm.weight",
2002
+ "shape": [
2003
+ 576
2004
+ ],
2005
+ "dtype": "float16",
2006
+ "format": "f32-to-bf16",
2007
+ "nbytes": 1152,
2008
+ "byteOffset": 21904128
2009
+ }
2010
+ ],
2011
+ "md5sum": "f9dd83a7b44d6b67df5563c1694eebc8"
2012
+ }
2013
+ ]
2014
+ }
params_shard_0.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:de9e29bf8e802ab8fc7f8c3170cf475201510cb7476ffbbb9061e345c5451244
3
+ size 56623104
params_shard_1.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:bc834ec80a468267069b04d41eeb011a283d5b9e5c226e448db112c6b697620d
3
+ size 30091392
params_shard_2.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:88dd6061d7bcbba129011779b2172aaf716e1fe3e876b9e810d61466c3a157f1
3
+ size 32966784
params_shard_3.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:095f34e36991e182e34f8acce3a04864ab6783b30d4c6d0c1b9e81de85aa852c
3
+ size 30754944
params_shard_4.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:08c159c9a46e4d7718cd653270feb21fb2295cd2abf3afdde5fa4523f3fcc9de
3
+ size 32966784
params_shard_5.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:774d5aa721c4b14ad949bc6beefd6f0d0f91e9139d3730687dc08d56ecf9974d
3
+ size 30754944
params_shard_6.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c13fe96505722775eea2e7d0a731b311dde976b709a9ac34287b7546d29cd20a
3
+ size 32966784
params_shard_7.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:740919345e3a0ac0412f3c58c10783b08a3094a64920336a8f134eddf56ac199
3
+ size 21905280
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,154 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_prefix_space": false,
3
+ "added_tokens_decoder": {
4
+ "0": {
5
+ "content": "<|endoftext|>",
6
+ "lstrip": false,
7
+ "normalized": false,
8
+ "rstrip": false,
9
+ "single_word": false,
10
+ "special": true
11
+ },
12
+ "1": {
13
+ "content": "<|im_start|>",
14
+ "lstrip": false,
15
+ "normalized": false,
16
+ "rstrip": false,
17
+ "single_word": false,
18
+ "special": true
19
+ },
20
+ "2": {
21
+ "content": "<|im_end|>",
22
+ "lstrip": false,
23
+ "normalized": false,
24
+ "rstrip": false,
25
+ "single_word": false,
26
+ "special": true
27
+ },
28
+ "3": {
29
+ "content": "<repo_name>",
30
+ "lstrip": false,
31
+ "normalized": false,
32
+ "rstrip": false,
33
+ "single_word": false,
34
+ "special": true
35
+ },
36
+ "4": {
37
+ "content": "<reponame>",
38
+ "lstrip": false,
39
+ "normalized": false,
40
+ "rstrip": false,
41
+ "single_word": false,
42
+ "special": true
43
+ },
44
+ "5": {
45
+ "content": "<file_sep>",
46
+ "lstrip": false,
47
+ "normalized": false,
48
+ "rstrip": false,
49
+ "single_word": false,
50
+ "special": true
51
+ },
52
+ "6": {
53
+ "content": "<filename>",
54
+ "lstrip": false,
55
+ "normalized": false,
56
+ "rstrip": false,
57
+ "single_word": false,
58
+ "special": true
59
+ },
60
+ "7": {
61
+ "content": "<gh_stars>",
62
+ "lstrip": false,
63
+ "normalized": false,
64
+ "rstrip": false,
65
+ "single_word": false,
66
+ "special": true
67
+ },
68
+ "8": {
69
+ "content": "<issue_start>",
70
+ "lstrip": false,
71
+ "normalized": false,
72
+ "rstrip": false,
73
+ "single_word": false,
74
+ "special": true
75
+ },
76
+ "9": {
77
+ "content": "<issue_comment>",
78
+ "lstrip": false,
79
+ "normalized": false,
80
+ "rstrip": false,
81
+ "single_word": false,
82
+ "special": true
83
+ },
84
+ "10": {
85
+ "content": "<issue_closed>",
86
+ "lstrip": false,
87
+ "normalized": false,
88
+ "rstrip": false,
89
+ "single_word": false,
90
+ "special": true
91
+ },
92
+ "11": {
93
+ "content": "<jupyter_start>",
94
+ "lstrip": false,
95
+ "normalized": false,
96
+ "rstrip": false,
97
+ "single_word": false,
98
+ "special": true
99
+ },
100
+ "12": {
101
+ "content": "<jupyter_text>",
102
+ "lstrip": false,
103
+ "normalized": false,
104
+ "rstrip": false,
105
+ "single_word": false,
106
+ "special": true
107
+ },
108
+ "13": {
109
+ "content": "<jupyter_code>",
110
+ "lstrip": false,
111
+ "normalized": false,
112
+ "rstrip": false,
113
+ "single_word": false,
114
+ "special": true
115
+ },
116
+ "14": {
117
+ "content": "<jupyter_output>",
118
+ "lstrip": false,
119
+ "normalized": false,
120
+ "rstrip": false,
121
+ "single_word": false,
122
+ "special": true
123
+ },
124
+ "15": {
125
+ "content": "<jupyter_script>",
126
+ "lstrip": false,
127
+ "normalized": false,
128
+ "rstrip": false,
129
+ "single_word": false,
130
+ "special": true
131
+ },
132
+ "16": {
133
+ "content": "<empty_output>",
134
+ "lstrip": false,
135
+ "normalized": false,
136
+ "rstrip": false,
137
+ "single_word": false,
138
+ "special": true
139
+ }
140
+ },
141
+ "additional_special_tokens": [
142
+ "<|im_start|>",
143
+ "<|im_end|>"
144
+ ],
145
+ "bos_token": "<|im_start|>",
146
+ "chat_template": "{% for message in messages %}{{'<|im_start|>' + message['role'] + '\n' + message['content'] + '<|im_end|>' + '\n'}}{% endfor %}{% if add_generation_prompt %}{{ '<|im_start|>assistant\n' }}{% endif %}",
147
+ "clean_up_tokenization_spaces": false,
148
+ "eos_token": "<|im_end|>",
149
+ "model_max_length": 2048,
150
+ "pad_token": "<|im_end|>",
151
+ "tokenizer_class": "GPT2Tokenizer",
152
+ "unk_token": "<|endoftext|>",
153
+ "vocab_size": 49152
154
+ }
vocab.json ADDED
The diff for this file is too large to render. See raw diff