mofosyne commited on
Commit
7fb699e
β€’
1 Parent(s): 19c0773

readme update

Browse files
Files changed (1) hide show
  1. README.md +67 -57
README.md CHANGED
@@ -14,6 +14,8 @@ tags:
14
  - Model creator: [Maykeye](https://huggingface.co/Maykeye)
15
  - Original model: [TinyLLama-v0](https://huggingface.co/Maykeye/TinyLLama-v0)
16
 
 
 
17
  ## Description
18
 
19
  * This repo is targeted towards:
@@ -73,26 +75,12 @@ make: Nothing to be done for 'all'.
73
  make: Nothing to be done for 'all'.
74
  ~/huggingface/TinyLLama-v0-5M-F16-llamafile
75
  == What is our llamafile name going to be? ==
76
- We will be aiming to generate Tinyllama-5M-v0.2-F16.llamafile
 
77
  == Convert from safetensor to gguf ==
78
  INFO:hf-to-gguf:Loading model: maykeye_tinyllama
79
- INFO:hf-to-gguf:gguf: loading model part 'model.safetensors'
80
- INFO:gguf.gguf_writer:gguf: Will write to maykeye_tinyllama/Tinyllama-5M-v0.2-F16.gguf
81
  INFO:gguf.gguf_writer:gguf: This GGUF file is for Little Endian only
82
- INFO:hf-to-gguf:Set meta model
83
- INFO:hf-to-gguf:Set model parameters
84
- INFO:hf-to-gguf:gguf: context length = 2048
85
- INFO:hf-to-gguf:gguf: embedding length = 64
86
- INFO:hf-to-gguf:gguf: feed forward length = 256
87
- INFO:hf-to-gguf:gguf: head count = 16
88
- INFO:hf-to-gguf:gguf: rms norm epsilon = 1e-06
89
- INFO:hf-to-gguf:gguf: file type = 1
90
- INFO:hf-to-gguf:Set model tokenizer
91
- INFO:gguf.vocab:Setting special token type bos to 1
92
- INFO:gguf.vocab:Setting special token type eos to 2
93
- INFO:gguf.vocab:Setting special token type unk to 0
94
- INFO:gguf.vocab:Setting special token type pad to 0
95
- INFO:hf-to-gguf:Exporting model to 'maykeye_tinyllama/Tinyllama-5M-v0.2-F16.gguf'
96
  INFO:hf-to-gguf:gguf: loading model part 'model.safetensors'
97
  INFO:hf-to-gguf:output.weight, torch.bfloat16 --> F16, shape = {64, 32000}
98
  INFO:hf-to-gguf:token_embd.weight, torch.bfloat16 --> F16, shape = {64, 32000}
@@ -169,44 +157,64 @@ INFO:hf-to-gguf:blk.7.attn_output.weight, torch.bfloat16 --> F16, shape = {64,
169
  INFO:hf-to-gguf:blk.7.attn_q.weight, torch.bfloat16 --> F16, shape = {64, 64}
170
  INFO:hf-to-gguf:blk.7.attn_v.weight, torch.bfloat16 --> F16, shape = {64, 64}
171
  INFO:hf-to-gguf:output_norm.weight, torch.bfloat16 --> F32, shape = {64}
172
- Writing: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 9.24M/9.24M [00:00<00:00, 139Mbyte/s]
173
- INFO:hf-to-gguf:Model successfully exported to 'maykeye_tinyllama/Tinyllama-5M-v0.2-F16.gguf'
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
174
  == Generating Llamafile ==
175
- == Test Output ==
176
  note: if you have an AMD or NVIDIA GPU then you need to pass -ngl 9999 to enable GPU offloading
177
- main: llamafile version 0.8.6
178
- main: seed = 1717436617
179
- llama_model_loader: loaded meta data with 29 key-value pairs and 75 tensors from Tinyllama-5M-v0.2-F16.gguf (version GGUF V3 (latest))
180
  llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
181
  llama_model_loader: - kv 0: general.architecture str = llama
182
- llama_model_loader: - kv 1: general.name str = TinyLLama
183
- llama_model_loader: - kv 2: general.author str = mofosyne
184
- llama_model_loader: - kv 3: general.version str = v0.2
185
- llama_model_loader: - kv 4: general.url str = https://huggingface.co/mofosyne/TinyL...
186
  llama_model_loader: - kv 5: general.description str = This gguf is ported from a first vers...
187
- llama_model_loader: - kv 6: general.license str = apache-2.0
188
- llama_model_loader: - kv 7: general.source.url str = https://huggingface.co/Maykeye/TinyLL...
189
- llama_model_loader: - kv 8: general.source.huggingface.repository str = Maykeye/TinyLLama-v0
190
- llama_model_loader: - kv 9: general.parameter_weight_class str = 5M
191
- llama_model_loader: - kv 10: llama.block_count u32 = 8
192
- llama_model_loader: - kv 11: llama.context_length u32 = 2048
193
- llama_model_loader: - kv 12: llama.embedding_length u32 = 64
194
- llama_model_loader: - kv 13: llama.feed_forward_length u32 = 256
195
- llama_model_loader: - kv 14: llama.attention.head_count u32 = 16
196
- llama_model_loader: - kv 15: llama.attention.layer_norm_rms_epsilon f32 = 0.000001
197
- llama_model_loader: - kv 16: general.file_type u32 = 1
198
- llama_model_loader: - kv 17: llama.vocab_size u32 = 32000
199
- llama_model_loader: - kv 18: llama.rope.dimension_count u32 = 4
200
- llama_model_loader: - kv 19: tokenizer.ggml.model str = llama
201
- llama_model_loader: - kv 20: tokenizer.ggml.pre str = default
202
- llama_model_loader: - kv 21: tokenizer.ggml.tokens arr[str,32000] = ["<unk>", "<s>", "</s>", "<0x00>", "<...
203
- llama_model_loader: - kv 22: tokenizer.ggml.scores arr[f32,32000] = [0.000000, 0.000000, 0.000000, 0.0000...
204
- llama_model_loader: - kv 23: tokenizer.ggml.token_type arr[i32,32000] = [2, 3, 3, 6, 6, 6, 6, 6, 6, 6, 6, 6, ...
205
- llama_model_loader: - kv 24: tokenizer.ggml.bos_token_id u32 = 1
206
- llama_model_loader: - kv 25: tokenizer.ggml.eos_token_id u32 = 2
207
- llama_model_loader: - kv 26: tokenizer.ggml.unknown_token_id u32 = 0
208
- llama_model_loader: - kv 27: tokenizer.ggml.padding_token_id u32 = 0
209
- llama_model_loader: - kv 28: general.quantization_version u32 = 2
 
 
 
 
210
  llama_model_loader: - type f32: 17 tensors
211
  llama_model_loader: - type f16: 58 tensors
212
  llm_load_vocab: special tokens definition check successful ( 259/32000 ).
@@ -221,6 +229,7 @@ llm_load_print_meta: n_head = 16
221
  llm_load_print_meta: n_head_kv = 16
222
  llm_load_print_meta: n_layer = 8
223
  llm_load_print_meta: n_rot = 4
 
224
  llm_load_print_meta: n_embd_head_k = 4
225
  llm_load_print_meta: n_embd_head_v = 4
226
  llm_load_print_meta: n_gqa = 1
@@ -282,14 +291,15 @@ CFG -> Penalties -> top_k -> tfs_z -> typical_p -> top_p -> min_p -> temperature
282
  generate: n_ctx = 512, n_batch = 2048, n_predict = -1, n_keep = 1
283
 
284
 
285
- hello world the gruff man said yes. She was very happy. The man waved goodbye to the little boy and said the little boy. It was the best day ever.
286
- The little boy was so excited. He took off his special favorite toy and a beautiful dress. He gave it to the little boy and said "thank you" to the little girl. He said "Thank you for being so clever. The man and the little boy both smiled. [end of text]
 
287
 
288
 
289
- llama_print_timings: load time = 9.88 ms
290
- llama_print_timings: sample time = 3.83 ms / 89 runs ( 0.04 ms per token, 23249.74 tokens per second)
291
- llama_print_timings: prompt eval time = 1.61 ms / 8 tokens ( 0.20 ms per token, 4968.94 tokens per second)
292
- llama_print_timings: eval time = 214.13 ms / 88 runs ( 2.43 ms per token, 410.96 tokens per second)
293
- llama_print_timings: total time = 237.74 ms / 96 tokens
294
  Log end
295
  ```
 
14
  - Model creator: [Maykeye](https://huggingface.co/Maykeye)
15
  - Original model: [TinyLLama-v0](https://huggingface.co/Maykeye/TinyLLama-v0)
16
 
17
+ If interested in the internal content of this model you can check [Tinyllama-4.6M-v0.0-F16.dump.md](./Tinyllama-4.6M-v0.0-F16.dump.md) included in this repo.
18
+
19
  ## Description
20
 
21
  * This repo is targeted towards:
 
75
  make: Nothing to be done for 'all'.
76
  ~/huggingface/TinyLLama-v0-5M-F16-llamafile
77
  == What is our llamafile name going to be? ==
78
+ maykeye_tinyllama/Tinyllama-4.6M-v0.0-F16.gguf
79
+ We will be aiming to generate Tinyllama-4.6M-v0.0-F16.llamafile
80
  == Convert from safetensor to gguf ==
81
  INFO:hf-to-gguf:Loading model: maykeye_tinyllama
 
 
82
  INFO:gguf.gguf_writer:gguf: This GGUF file is for Little Endian only
83
+ INFO:hf-to-gguf:Exporting model...
 
 
 
 
 
 
 
 
 
 
 
 
 
84
  INFO:hf-to-gguf:gguf: loading model part 'model.safetensors'
85
  INFO:hf-to-gguf:output.weight, torch.bfloat16 --> F16, shape = {64, 32000}
86
  INFO:hf-to-gguf:token_embd.weight, torch.bfloat16 --> F16, shape = {64, 32000}
 
157
  INFO:hf-to-gguf:blk.7.attn_q.weight, torch.bfloat16 --> F16, shape = {64, 64}
158
  INFO:hf-to-gguf:blk.7.attn_v.weight, torch.bfloat16 --> F16, shape = {64, 64}
159
  INFO:hf-to-gguf:output_norm.weight, torch.bfloat16 --> F32, shape = {64}
160
+ INFO:hf-to-gguf:Set meta model
161
+ INFO:hf-to-gguf:Set model parameters
162
+ INFO:hf-to-gguf:gguf: context length = 2048
163
+ INFO:hf-to-gguf:gguf: embedding length = 64
164
+ INFO:hf-to-gguf:gguf: feed forward length = 256
165
+ INFO:hf-to-gguf:gguf: head count = 16
166
+ INFO:hf-to-gguf:gguf: rms norm epsilon = 1e-06
167
+ INFO:hf-to-gguf:gguf: file type = 1
168
+ INFO:hf-to-gguf:Set model tokenizer
169
+ INFO:gguf.vocab:Setting special token type bos to 1
170
+ INFO:gguf.vocab:Setting special token type eos to 2
171
+ INFO:gguf.vocab:Setting special token type unk to 0
172
+ INFO:gguf.vocab:Setting special token type pad to 0
173
+ INFO:hf-to-gguf:Set model quantization version
174
+ INFO:gguf.gguf_writer:Writing the following files:
175
+ INFO:gguf.gguf_writer:maykeye_tinyllama/Tinyllama-4.6M-v0.0-F16.gguf: n_tensors = 75, total_size = 9.2M
176
+ Writing: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 9.24M/9.24M [00:00<00:00, 83.7Mbyte/s]
177
+ INFO:hf-to-gguf:Model successfully exported to maykeye_tinyllama/Tinyllama-4.6M-v0.0-F16.gguf
178
  == Generating Llamafile ==
179
+ == Test Output ./Tinyllama-4.6M-v0.0-F16.llamafile ==
180
  note: if you have an AMD or NVIDIA GPU then you need to pass -ngl 9999 to enable GPU offloading
181
+ main: llamafile version 0.8.9
182
+ main: seed = 1721461448
183
+ llama_model_loader: loaded meta data with 33 key-value pairs and 75 tensors from Tinyllama-4.6M-v0.0-F16.gguf (version GGUF V3 (latest))
184
  llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
185
  llama_model_loader: - kv 0: general.architecture str = llama
186
+ llama_model_loader: - kv 1: general.type str = model
187
+ llama_model_loader: - kv 2: general.name str = TinyLLama
188
+ llama_model_loader: - kv 3: general.author str = Maykeye
189
+ llama_model_loader: - kv 4: general.version str = v0.0
190
  llama_model_loader: - kv 5: general.description str = This gguf is ported from a first vers...
191
+ llama_model_loader: - kv 6: general.quantized_by str = Mofosyne
192
+ llama_model_loader: - kv 7: general.size_label str = 4.6M
193
+ llama_model_loader: - kv 8: general.license str = apache-2.0
194
+ llama_model_loader: - kv 9: general.url str = https://huggingface.co/mofosyne/TinyL...
195
+ llama_model_loader: - kv 10: general.source.url str = https://huggingface.co/Maykeye/TinyLL...
196
+ llama_model_loader: - kv 11: general.tags arr[str,5] = ["text generation", "transformer", "l...
197
+ llama_model_loader: - kv 12: general.languages arr[str,1] = ["en"]
198
+ llama_model_loader: - kv 13: general.datasets arr[str,2] = ["https://huggingface.co/datasets/ron...
199
+ llama_model_loader: - kv 14: llama.block_count u32 = 8
200
+ llama_model_loader: - kv 15: llama.context_length u32 = 2048
201
+ llama_model_loader: - kv 16: llama.embedding_length u32 = 64
202
+ llama_model_loader: - kv 17: llama.feed_forward_length u32 = 256
203
+ llama_model_loader: - kv 18: llama.attention.head_count u32 = 16
204
+ llama_model_loader: - kv 19: llama.attention.layer_norm_rms_epsilon f32 = 0.000001
205
+ llama_model_loader: - kv 20: general.file_type u32 = 1
206
+ llama_model_loader: - kv 21: llama.vocab_size u32 = 32000
207
+ llama_model_loader: - kv 22: llama.rope.dimension_count u32 = 4
208
+ llama_model_loader: - kv 23: tokenizer.ggml.model str = llama
209
+ llama_model_loader: - kv 24: tokenizer.ggml.pre str = default
210
+ llama_model_loader: - kv 25: tokenizer.ggml.tokens arr[str,32000] = ["<unk>", "<s>", "</s>", "<0x00>", "<...
211
+ llama_model_loader: - kv 26: tokenizer.ggml.scores arr[f32,32000] = [0.000000, 0.000000, 0.000000, 0.0000...
212
+ llama_model_loader: - kv 27: tokenizer.ggml.token_type arr[i32,32000] = [2, 3, 3, 6, 6, 6, 6, 6, 6, 6, 6, 6, ...
213
+ llama_model_loader: - kv 28: tokenizer.ggml.bos_token_id u32 = 1
214
+ llama_model_loader: - kv 29: tokenizer.ggml.eos_token_id u32 = 2
215
+ llama_model_loader: - kv 30: tokenizer.ggml.unknown_token_id u32 = 0
216
+ llama_model_loader: - kv 31: tokenizer.ggml.padding_token_id u32 = 0
217
+ llama_model_loader: - kv 32: general.quantization_version u32 = 2
218
  llama_model_loader: - type f32: 17 tensors
219
  llama_model_loader: - type f16: 58 tensors
220
  llm_load_vocab: special tokens definition check successful ( 259/32000 ).
 
229
  llm_load_print_meta: n_head_kv = 16
230
  llm_load_print_meta: n_layer = 8
231
  llm_load_print_meta: n_rot = 4
232
+ llm_load_print_meta: n_swa = 0
233
  llm_load_print_meta: n_embd_head_k = 4
234
  llm_load_print_meta: n_embd_head_v = 4
235
  llm_load_print_meta: n_gqa = 1
 
291
  generate: n_ctx = 512, n_batch = 2048, n_predict = -1, n_keep = 1
292
 
293
 
294
+ hello world the gruff man said no. The man was very sad and wanted to see what was wrong. He asked the man if they could do it. But they did not like his way to the park.
295
+ One day, the man decided to go in and he took off his own new home. He gave the bird a little bit of his friend. He said he had to find a way to hide it in his woods. The man was very happy, but he knew he needed to make it in the yard.
296
+ The man was very sad and he could not find the bird. He didn't want to get to the park and his friend was very sad. They could not find the bird and his friend. But the man was too sad. He had no friends and no friends. [end of text]
297
 
298
 
299
+ llama_print_timings: load time = 10.26 ms
300
+ llama_print_timings: sample time = 6.03 ms / 156 runs ( 0.04 ms per token, 25879.23 tokens per second)
301
+ llama_print_timings: prompt eval time = 2.16 ms / 8 tokens ( 0.27 ms per token, 3696.86 tokens per second)
302
+ llama_print_timings: eval time = 748.08 ms / 155 runs ( 4.83 ms per token, 207.20 tokens per second)
303
+ llama_print_timings: total time = 800.80 ms / 163 tokens
304
  Log end
305
  ```