readme update
Browse files
README.md
CHANGED
@@ -14,6 +14,8 @@ tags:
|
|
14 |
- Model creator: [Maykeye](https://huggingface.co/Maykeye)
|
15 |
- Original model: [TinyLLama-v0](https://huggingface.co/Maykeye/TinyLLama-v0)
|
16 |
|
|
|
|
|
17 |
## Description
|
18 |
|
19 |
* This repo is targeted towards:
|
@@ -73,26 +75,12 @@ make: Nothing to be done for 'all'.
|
|
73 |
make: Nothing to be done for 'all'.
|
74 |
~/huggingface/TinyLLama-v0-5M-F16-llamafile
|
75 |
== What is our llamafile name going to be? ==
|
76 |
-
|
|
|
77 |
== Convert from safetensor to gguf ==
|
78 |
INFO:hf-to-gguf:Loading model: maykeye_tinyllama
|
79 |
-
INFO:hf-to-gguf:gguf: loading model part 'model.safetensors'
|
80 |
-
INFO:gguf.gguf_writer:gguf: Will write to maykeye_tinyllama/Tinyllama-5M-v0.2-F16.gguf
|
81 |
INFO:gguf.gguf_writer:gguf: This GGUF file is for Little Endian only
|
82 |
-
INFO:hf-to-gguf:
|
83 |
-
INFO:hf-to-gguf:Set model parameters
|
84 |
-
INFO:hf-to-gguf:gguf: context length = 2048
|
85 |
-
INFO:hf-to-gguf:gguf: embedding length = 64
|
86 |
-
INFO:hf-to-gguf:gguf: feed forward length = 256
|
87 |
-
INFO:hf-to-gguf:gguf: head count = 16
|
88 |
-
INFO:hf-to-gguf:gguf: rms norm epsilon = 1e-06
|
89 |
-
INFO:hf-to-gguf:gguf: file type = 1
|
90 |
-
INFO:hf-to-gguf:Set model tokenizer
|
91 |
-
INFO:gguf.vocab:Setting special token type bos to 1
|
92 |
-
INFO:gguf.vocab:Setting special token type eos to 2
|
93 |
-
INFO:gguf.vocab:Setting special token type unk to 0
|
94 |
-
INFO:gguf.vocab:Setting special token type pad to 0
|
95 |
-
INFO:hf-to-gguf:Exporting model to 'maykeye_tinyllama/Tinyllama-5M-v0.2-F16.gguf'
|
96 |
INFO:hf-to-gguf:gguf: loading model part 'model.safetensors'
|
97 |
INFO:hf-to-gguf:output.weight, torch.bfloat16 --> F16, shape = {64, 32000}
|
98 |
INFO:hf-to-gguf:token_embd.weight, torch.bfloat16 --> F16, shape = {64, 32000}
|
@@ -169,44 +157,64 @@ INFO:hf-to-gguf:blk.7.attn_output.weight, torch.bfloat16 --> F16, shape = {64,
|
|
169 |
INFO:hf-to-gguf:blk.7.attn_q.weight, torch.bfloat16 --> F16, shape = {64, 64}
|
170 |
INFO:hf-to-gguf:blk.7.attn_v.weight, torch.bfloat16 --> F16, shape = {64, 64}
|
171 |
INFO:hf-to-gguf:output_norm.weight, torch.bfloat16 --> F32, shape = {64}
|
172 |
-
|
173 |
-
INFO:hf-to-gguf:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
174 |
== Generating Llamafile ==
|
175 |
-
== Test Output ==
|
176 |
note: if you have an AMD or NVIDIA GPU then you need to pass -ngl 9999 to enable GPU offloading
|
177 |
-
main: llamafile version 0.8.
|
178 |
-
main: seed =
|
179 |
-
llama_model_loader: loaded meta data with
|
180 |
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
|
181 |
llama_model_loader: - kv 0: general.architecture str = llama
|
182 |
-
llama_model_loader: - kv 1: general.
|
183 |
-
llama_model_loader: - kv 2:
|
184 |
-
llama_model_loader: - kv 3:
|
185 |
-
llama_model_loader: - kv 4:
|
186 |
llama_model_loader: - kv 5: general.description str = This gguf is ported from a first vers...
|
187 |
-
llama_model_loader: - kv 6:
|
188 |
-
llama_model_loader: - kv 7: general.
|
189 |
-
llama_model_loader: - kv 8:
|
190 |
-
llama_model_loader: - kv 9:
|
191 |
-
llama_model_loader: - kv 10:
|
192 |
-
llama_model_loader: - kv 11:
|
193 |
-
llama_model_loader: - kv 12:
|
194 |
-
llama_model_loader: - kv 13:
|
195 |
-
llama_model_loader: - kv 14:
|
196 |
-
llama_model_loader: - kv 15:
|
197 |
-
llama_model_loader: - kv 16:
|
198 |
-
llama_model_loader: - kv 17:
|
199 |
-
llama_model_loader: - kv 18: llama.
|
200 |
-
llama_model_loader: - kv 19:
|
201 |
-
llama_model_loader: - kv 20:
|
202 |
-
llama_model_loader: - kv 21:
|
203 |
-
llama_model_loader: - kv 22:
|
204 |
-
llama_model_loader: - kv 23:
|
205 |
-
llama_model_loader: - kv 24:
|
206 |
-
llama_model_loader: - kv 25:
|
207 |
-
llama_model_loader: - kv 26:
|
208 |
-
llama_model_loader: - kv 27:
|
209 |
-
llama_model_loader: - kv 28:
|
|
|
|
|
|
|
|
|
210 |
llama_model_loader: - type f32: 17 tensors
|
211 |
llama_model_loader: - type f16: 58 tensors
|
212 |
llm_load_vocab: special tokens definition check successful ( 259/32000 ).
|
@@ -221,6 +229,7 @@ llm_load_print_meta: n_head = 16
|
|
221 |
llm_load_print_meta: n_head_kv = 16
|
222 |
llm_load_print_meta: n_layer = 8
|
223 |
llm_load_print_meta: n_rot = 4
|
|
|
224 |
llm_load_print_meta: n_embd_head_k = 4
|
225 |
llm_load_print_meta: n_embd_head_v = 4
|
226 |
llm_load_print_meta: n_gqa = 1
|
@@ -282,14 +291,15 @@ CFG -> Penalties -> top_k -> tfs_z -> typical_p -> top_p -> min_p -> temperature
|
|
282 |
generate: n_ctx = 512, n_batch = 2048, n_predict = -1, n_keep = 1
|
283 |
|
284 |
|
285 |
-
hello world the gruff man said
|
286 |
-
|
|
|
287 |
|
288 |
|
289 |
-
llama_print_timings: load time =
|
290 |
-
llama_print_timings: sample time =
|
291 |
-
llama_print_timings: prompt eval time =
|
292 |
-
llama_print_timings: eval time =
|
293 |
-
llama_print_timings: total time =
|
294 |
Log end
|
295 |
```
|
|
|
14 |
- Model creator: [Maykeye](https://huggingface.co/Maykeye)
|
15 |
- Original model: [TinyLLama-v0](https://huggingface.co/Maykeye/TinyLLama-v0)
|
16 |
|
17 |
+
If interested in the internal content of this model you can check [Tinyllama-4.6M-v0.0-F16.dump.md](./Tinyllama-4.6M-v0.0-F16.dump.md) included in this repo.
|
18 |
+
|
19 |
## Description
|
20 |
|
21 |
* This repo is targeted towards:
|
|
|
75 |
make: Nothing to be done for 'all'.
|
76 |
~/huggingface/TinyLLama-v0-5M-F16-llamafile
|
77 |
== What is our llamafile name going to be? ==
|
78 |
+
maykeye_tinyllama/Tinyllama-4.6M-v0.0-F16.gguf
|
79 |
+
We will be aiming to generate Tinyllama-4.6M-v0.0-F16.llamafile
|
80 |
== Convert from safetensor to gguf ==
|
81 |
INFO:hf-to-gguf:Loading model: maykeye_tinyllama
|
|
|
|
|
82 |
INFO:gguf.gguf_writer:gguf: This GGUF file is for Little Endian only
|
83 |
+
INFO:hf-to-gguf:Exporting model...
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
84 |
INFO:hf-to-gguf:gguf: loading model part 'model.safetensors'
|
85 |
INFO:hf-to-gguf:output.weight, torch.bfloat16 --> F16, shape = {64, 32000}
|
86 |
INFO:hf-to-gguf:token_embd.weight, torch.bfloat16 --> F16, shape = {64, 32000}
|
|
|
157 |
INFO:hf-to-gguf:blk.7.attn_q.weight, torch.bfloat16 --> F16, shape = {64, 64}
|
158 |
INFO:hf-to-gguf:blk.7.attn_v.weight, torch.bfloat16 --> F16, shape = {64, 64}
|
159 |
INFO:hf-to-gguf:output_norm.weight, torch.bfloat16 --> F32, shape = {64}
|
160 |
+
INFO:hf-to-gguf:Set meta model
|
161 |
+
INFO:hf-to-gguf:Set model parameters
|
162 |
+
INFO:hf-to-gguf:gguf: context length = 2048
|
163 |
+
INFO:hf-to-gguf:gguf: embedding length = 64
|
164 |
+
INFO:hf-to-gguf:gguf: feed forward length = 256
|
165 |
+
INFO:hf-to-gguf:gguf: head count = 16
|
166 |
+
INFO:hf-to-gguf:gguf: rms norm epsilon = 1e-06
|
167 |
+
INFO:hf-to-gguf:gguf: file type = 1
|
168 |
+
INFO:hf-to-gguf:Set model tokenizer
|
169 |
+
INFO:gguf.vocab:Setting special token type bos to 1
|
170 |
+
INFO:gguf.vocab:Setting special token type eos to 2
|
171 |
+
INFO:gguf.vocab:Setting special token type unk to 0
|
172 |
+
INFO:gguf.vocab:Setting special token type pad to 0
|
173 |
+
INFO:hf-to-gguf:Set model quantization version
|
174 |
+
INFO:gguf.gguf_writer:Writing the following files:
|
175 |
+
INFO:gguf.gguf_writer:maykeye_tinyllama/Tinyllama-4.6M-v0.0-F16.gguf: n_tensors = 75, total_size = 9.2M
|
176 |
+
Writing: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 9.24M/9.24M [00:00<00:00, 83.7Mbyte/s]
|
177 |
+
INFO:hf-to-gguf:Model successfully exported to maykeye_tinyllama/Tinyllama-4.6M-v0.0-F16.gguf
|
178 |
== Generating Llamafile ==
|
179 |
+
== Test Output ./Tinyllama-4.6M-v0.0-F16.llamafile ==
|
180 |
note: if you have an AMD or NVIDIA GPU then you need to pass -ngl 9999 to enable GPU offloading
|
181 |
+
main: llamafile version 0.8.9
|
182 |
+
main: seed = 1721461448
|
183 |
+
llama_model_loader: loaded meta data with 33 key-value pairs and 75 tensors from Tinyllama-4.6M-v0.0-F16.gguf (version GGUF V3 (latest))
|
184 |
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
|
185 |
llama_model_loader: - kv 0: general.architecture str = llama
|
186 |
+
llama_model_loader: - kv 1: general.type str = model
|
187 |
+
llama_model_loader: - kv 2: general.name str = TinyLLama
|
188 |
+
llama_model_loader: - kv 3: general.author str = Maykeye
|
189 |
+
llama_model_loader: - kv 4: general.version str = v0.0
|
190 |
llama_model_loader: - kv 5: general.description str = This gguf is ported from a first vers...
|
191 |
+
llama_model_loader: - kv 6: general.quantized_by str = Mofosyne
|
192 |
+
llama_model_loader: - kv 7: general.size_label str = 4.6M
|
193 |
+
llama_model_loader: - kv 8: general.license str = apache-2.0
|
194 |
+
llama_model_loader: - kv 9: general.url str = https://huggingface.co/mofosyne/TinyL...
|
195 |
+
llama_model_loader: - kv 10: general.source.url str = https://huggingface.co/Maykeye/TinyLL...
|
196 |
+
llama_model_loader: - kv 11: general.tags arr[str,5] = ["text generation", "transformer", "l...
|
197 |
+
llama_model_loader: - kv 12: general.languages arr[str,1] = ["en"]
|
198 |
+
llama_model_loader: - kv 13: general.datasets arr[str,2] = ["https://huggingface.co/datasets/ron...
|
199 |
+
llama_model_loader: - kv 14: llama.block_count u32 = 8
|
200 |
+
llama_model_loader: - kv 15: llama.context_length u32 = 2048
|
201 |
+
llama_model_loader: - kv 16: llama.embedding_length u32 = 64
|
202 |
+
llama_model_loader: - kv 17: llama.feed_forward_length u32 = 256
|
203 |
+
llama_model_loader: - kv 18: llama.attention.head_count u32 = 16
|
204 |
+
llama_model_loader: - kv 19: llama.attention.layer_norm_rms_epsilon f32 = 0.000001
|
205 |
+
llama_model_loader: - kv 20: general.file_type u32 = 1
|
206 |
+
llama_model_loader: - kv 21: llama.vocab_size u32 = 32000
|
207 |
+
llama_model_loader: - kv 22: llama.rope.dimension_count u32 = 4
|
208 |
+
llama_model_loader: - kv 23: tokenizer.ggml.model str = llama
|
209 |
+
llama_model_loader: - kv 24: tokenizer.ggml.pre str = default
|
210 |
+
llama_model_loader: - kv 25: tokenizer.ggml.tokens arr[str,32000] = ["<unk>", "<s>", "</s>", "<0x00>", "<...
|
211 |
+
llama_model_loader: - kv 26: tokenizer.ggml.scores arr[f32,32000] = [0.000000, 0.000000, 0.000000, 0.0000...
|
212 |
+
llama_model_loader: - kv 27: tokenizer.ggml.token_type arr[i32,32000] = [2, 3, 3, 6, 6, 6, 6, 6, 6, 6, 6, 6, ...
|
213 |
+
llama_model_loader: - kv 28: tokenizer.ggml.bos_token_id u32 = 1
|
214 |
+
llama_model_loader: - kv 29: tokenizer.ggml.eos_token_id u32 = 2
|
215 |
+
llama_model_loader: - kv 30: tokenizer.ggml.unknown_token_id u32 = 0
|
216 |
+
llama_model_loader: - kv 31: tokenizer.ggml.padding_token_id u32 = 0
|
217 |
+
llama_model_loader: - kv 32: general.quantization_version u32 = 2
|
218 |
llama_model_loader: - type f32: 17 tensors
|
219 |
llama_model_loader: - type f16: 58 tensors
|
220 |
llm_load_vocab: special tokens definition check successful ( 259/32000 ).
|
|
|
229 |
llm_load_print_meta: n_head_kv = 16
|
230 |
llm_load_print_meta: n_layer = 8
|
231 |
llm_load_print_meta: n_rot = 4
|
232 |
+
llm_load_print_meta: n_swa = 0
|
233 |
llm_load_print_meta: n_embd_head_k = 4
|
234 |
llm_load_print_meta: n_embd_head_v = 4
|
235 |
llm_load_print_meta: n_gqa = 1
|
|
|
291 |
generate: n_ctx = 512, n_batch = 2048, n_predict = -1, n_keep = 1
|
292 |
|
293 |
|
294 |
+
hello world the gruff man said no. The man was very sad and wanted to see what was wrong. He asked the man if they could do it. But they did not like his way to the park.
|
295 |
+
One day, the man decided to go in and he took off his own new home. He gave the bird a little bit of his friend. He said he had to find a way to hide it in his woods. The man was very happy, but he knew he needed to make it in the yard.
|
296 |
+
The man was very sad and he could not find the bird. He didn't want to get to the park and his friend was very sad. They could not find the bird and his friend. But the man was too sad. He had no friends and no friends. [end of text]
|
297 |
|
298 |
|
299 |
+
llama_print_timings: load time = 10.26 ms
|
300 |
+
llama_print_timings: sample time = 6.03 ms / 156 runs ( 0.04 ms per token, 25879.23 tokens per second)
|
301 |
+
llama_print_timings: prompt eval time = 2.16 ms / 8 tokens ( 0.27 ms per token, 3696.86 tokens per second)
|
302 |
+
llama_print_timings: eval time = 748.08 ms / 155 runs ( 4.83 ms per token, 207.20 tokens per second)
|
303 |
+
llama_print_timings: total time = 800.80 ms / 163 tokens
|
304 |
Log end
|
305 |
```
|