update readme
Browse files
README.md
CHANGED
@@ -17,7 +17,7 @@ tags:
|
|
17 |
## Description
|
18 |
|
19 |
* This repo is targeted towards:
|
20 |
-
- People who just want to quickly try out the llamafile technology by running `./
|
21 |
- Developers who would like a quick demo on the steps to convert an existing model from safe tensor format to a gguf and packaged into a llamafile for easy distribution (Just run `llamafile-creation.sh` to retrace the steps).
|
22 |
- Researchers who are just curious about the state of AI technology in terms of shrinking AI models, as the original model was from a replication attempt of a research paper.
|
23 |
|
@@ -41,13 +41,13 @@ Anyway, this conversion to [llamafile](https://github.com/Mozilla-Ocho/llamafile
|
|
41 |
|
42 |
```bash
|
43 |
# if not already usable
|
44 |
-
chmod +x
|
45 |
|
46 |
# To start the llamafile in web sever mode just call this directly
|
47 |
-
./
|
48 |
|
49 |
# To start the llamafile in command line use this command
|
50 |
-
./
|
51 |
```
|
52 |
|
53 |
## About llamafile
|
@@ -65,140 +65,148 @@ llamafile is a new format introduced by Mozilla Ocho on Nov 20th 2023. It uses C
|
|
65 |
For the most current replication steps, refer to the bash script `llamafile-creation.sh` in this repo.
|
66 |
|
67 |
```
|
68 |
-
$ llamafile-creation.sh
|
69 |
-
llamafile-creation.sh: command not found
|
70 |
-
mofosyne@mofosyne-Z97MX-Gaming-5:~/huggingface/TinyLLama-v0-llamafile$ ./llamafile-creation.sh
|
71 |
== Prep Enviroment ==
|
72 |
== Build and prep the llamafile engine execuable ==
|
73 |
-
~/huggingface/TinyLLama-v0-llamafile/llamafile ~/huggingface/TinyLLama-v0-llamafile
|
74 |
make: Nothing to be done for 'all'.
|
75 |
make: Nothing to be done for 'all'.
|
76 |
-
~/huggingface/TinyLLama-v0-llamafile
|
77 |
== What is our llamafile name going to be? ==
|
78 |
-
We will be aiming to generate
|
79 |
== Convert from safetensor to gguf ==
|
80 |
-
INFO:
|
81 |
-
INFO:
|
82 |
-
INFO:
|
83 |
-
INFO:convert:Loaded vocab file PosixPath('maykeye_tinyllama/tokenizer.model'), type 'spm'
|
84 |
-
INFO:convert:Vocab info: <SentencePieceVocab with 32000 base tokens and 0 added tokens>
|
85 |
-
INFO:convert:Special vocab info: <SpecialVocab with 0 merges, special tokens {'bos': 1, 'eos': 2, 'unk': 0, 'pad': 0}, add special tokens unset>
|
86 |
-
INFO:convert:Writing maykeye_tinyllama/TinyLLama-v0-5M-F16.gguf, format 1
|
87 |
-
WARNING:convert:Ignoring added_tokens.json since model matches vocab size without it.
|
88 |
INFO:gguf.gguf_writer:gguf: This GGUF file is for Little Endian only
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
89 |
INFO:gguf.vocab:Setting special token type bos to 1
|
90 |
INFO:gguf.vocab:Setting special token type eos to 2
|
91 |
INFO:gguf.vocab:Setting special token type unk to 0
|
92 |
INFO:gguf.vocab:Setting special token type pad to 0
|
93 |
-
INFO:
|
94 |
-
INFO:
|
95 |
-
INFO:
|
96 |
-
INFO:
|
97 |
-
INFO:
|
98 |
-
INFO:
|
99 |
-
INFO:
|
100 |
-
INFO:
|
101 |
-
INFO:
|
102 |
-
INFO:
|
103 |
-
INFO:
|
104 |
-
INFO:
|
105 |
-
INFO:
|
106 |
-
INFO:
|
107 |
-
INFO:
|
108 |
-
INFO:
|
109 |
-
INFO:
|
110 |
-
INFO:
|
111 |
-
INFO:
|
112 |
-
INFO:
|
113 |
-
INFO:
|
114 |
-
INFO:
|
115 |
-
INFO:
|
116 |
-
INFO:
|
117 |
-
INFO:
|
118 |
-
INFO:
|
119 |
-
INFO:
|
120 |
-
INFO:
|
121 |
-
INFO:
|
122 |
-
INFO:
|
123 |
-
INFO:
|
124 |
-
INFO:
|
125 |
-
INFO:
|
126 |
-
INFO:
|
127 |
-
INFO:
|
128 |
-
INFO:
|
129 |
-
INFO:
|
130 |
-
INFO:
|
131 |
-
INFO:
|
132 |
-
INFO:
|
133 |
-
INFO:
|
134 |
-
INFO:
|
135 |
-
INFO:
|
136 |
-
INFO:
|
137 |
-
INFO:
|
138 |
-
INFO:
|
139 |
-
INFO:
|
140 |
-
INFO:
|
141 |
-
INFO:
|
142 |
-
INFO:
|
143 |
-
INFO:
|
144 |
-
INFO:
|
145 |
-
INFO:
|
146 |
-
INFO:
|
147 |
-
INFO:
|
148 |
-
INFO:
|
149 |
-
INFO:
|
150 |
-
INFO:
|
151 |
-
INFO:
|
152 |
-
INFO:
|
153 |
-
INFO:
|
154 |
-
INFO:
|
155 |
-
INFO:
|
156 |
-
INFO:
|
157 |
-
INFO:
|
158 |
-
INFO:
|
159 |
-
INFO:
|
160 |
-
INFO:
|
161 |
-
INFO:
|
162 |
-
INFO:
|
163 |
-
INFO:
|
164 |
-
INFO:
|
165 |
-
INFO:
|
166 |
-
INFO:
|
167 |
-
INFO:
|
168 |
-
INFO:
|
|
|
|
|
|
|
169 |
== Generating Llamafile ==
|
170 |
== Test Output ==
|
171 |
note: if you have an AMD or NVIDIA GPU then you need to pass -ngl 9999 to enable GPU offloading
|
172 |
-
main: llamafile version 0.8.
|
173 |
-
main: seed =
|
174 |
-
llama_model_loader: loaded meta data with
|
175 |
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
|
176 |
llama_model_loader: - kv 0: general.architecture str = llama
|
177 |
llama_model_loader: - kv 1: general.name str = TinyLLama
|
178 |
llama_model_loader: - kv 2: general.author str = mofosyne
|
179 |
-
llama_model_loader: - kv 3: general.version str = v0
|
180 |
llama_model_loader: - kv 4: general.url str = https://huggingface.co/mofosyne/TinyL...
|
181 |
llama_model_loader: - kv 5: general.description str = This gguf is ported from a first vers...
|
182 |
-
llama_model_loader: - kv 6:
|
183 |
-
llama_model_loader: - kv 7:
|
184 |
-
llama_model_loader: - kv 8:
|
185 |
-
llama_model_loader: - kv 9:
|
186 |
-
llama_model_loader: - kv 10:
|
187 |
-
llama_model_loader: - kv 11:
|
188 |
-
llama_model_loader: - kv 12:
|
189 |
-
llama_model_loader: - kv 13:
|
190 |
llama_model_loader: - kv 14: llama.attention.head_count u32 = 16
|
191 |
-
llama_model_loader: - kv 15:
|
192 |
-
llama_model_loader: - kv 16:
|
193 |
-
llama_model_loader: - kv 17:
|
194 |
-
llama_model_loader: - kv 18:
|
195 |
-
llama_model_loader: - kv 19:
|
196 |
-
llama_model_loader: - kv 20:
|
197 |
-
llama_model_loader: - kv 21:
|
198 |
-
llama_model_loader: - kv 22:
|
199 |
-
llama_model_loader: - kv 23:
|
200 |
-
llama_model_loader: - kv 24:
|
201 |
-
llama_model_loader: - kv 25:
|
|
|
|
|
|
|
202 |
llama_model_loader: - type f32: 17 tensors
|
203 |
llama_model_loader: - type f16: 58 tensors
|
204 |
llm_load_vocab: special tokens definition check successful ( 259/32000 ).
|
@@ -264,7 +272,7 @@ llama_new_context_with_model: CPU compute buffer size = 62.75 MiB
|
|
264 |
llama_new_context_with_model: graph nodes = 262
|
265 |
llama_new_context_with_model: graph splits = 1
|
266 |
|
267 |
-
system_info: n_threads = 4 / 8 | AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 |
|
268 |
sampling:
|
269 |
repeat_last_n = 64, repeat_penalty = 1.000, frequency_penalty = 0.000, presence_penalty = 0.000
|
270 |
top_k = 40, tfs_z = 1.000, top_p = 0.950, min_p = 0.050, typical_p = 1.000, temp = 0.800
|
@@ -274,21 +282,14 @@ CFG -> Penalties -> top_k -> tfs_z -> typical_p -> top_p -> min_p -> temperature
|
|
274 |
generate: n_ctx = 512, n_batch = 2048, n_predict = -1, n_keep = 1
|
275 |
|
276 |
|
277 |
-
|
278 |
-
The
|
279 |
-
The man said, "I am scared of a big surprise. I will help you."
|
280 |
-
The man looked at the boy and said, "I can help you. I can make the little boy's wings. The man makes the girl laugh. She was so kind and happy.
|
281 |
-
The boy said, "You are too mean to me. You can't give out the problem."
|
282 |
-
The girl said, "I will help you!"
|
283 |
-
The man stopped and said, "I can help you. I'm sorry for a little girl, but you must tell the boy to be careful. Do you want to be kind."
|
284 |
-
The boy smiled and said, "Yes, I want to help you. Let's go into the pond and have fun!"
|
285 |
-
The boy and the man went to the lake to the pond. They had a great time and the man was able to help.</s> [end of text]
|
286 |
|
287 |
|
288 |
-
llama_print_timings: load time =
|
289 |
-
llama_print_timings: sample time =
|
290 |
-
llama_print_timings: prompt eval time =
|
291 |
-
llama_print_timings: eval time =
|
292 |
-
llama_print_timings: total time =
|
293 |
Log end
|
294 |
```
|
|
|
17 |
## Description
|
18 |
|
19 |
* This repo is targeted towards:
|
20 |
+
- People who just want to quickly try out the llamafile technology by running `./Tinyllama-5M-v0.2-F16.llamafile --cli -p "hello world"` as this llamafile is only 17.6Β MB in size!
|
21 |
- Developers who would like a quick demo on the steps to convert an existing model from safe tensor format to a gguf and packaged into a llamafile for easy distribution (Just run `llamafile-creation.sh` to retrace the steps).
|
22 |
- Researchers who are just curious about the state of AI technology in terms of shrinking AI models, as the original model was from a replication attempt of a research paper.
|
23 |
|
|
|
41 |
|
42 |
```bash
|
43 |
# if not already usable
|
44 |
+
chmod +x Tinyllama-5M-v0.2-F16.llamafile
|
45 |
|
46 |
# To start the llamafile in web sever mode just call this directly
|
47 |
+
./Tinyllama-5M-v0.2-F16.llamafile
|
48 |
|
49 |
# To start the llamafile in command line use this command
|
50 |
+
./Tinyllama-5M-v0.2-F16.llamafile --cli -p "A dog and a cat"
|
51 |
```
|
52 |
|
53 |
## About llamafile
|
|
|
65 |
For the most current replication steps, refer to the bash script `llamafile-creation.sh` in this repo.
|
66 |
|
67 |
```
|
68 |
+
$ ./llamafile-creation.sh
|
|
|
|
|
69 |
== Prep Enviroment ==
|
70 |
== Build and prep the llamafile engine execuable ==
|
71 |
+
~/huggingface/TinyLLama-v0-5M-F16-llamafile/llamafile ~/huggingface/TinyLLama-v0-5M-F16-llamafile
|
72 |
make: Nothing to be done for 'all'.
|
73 |
make: Nothing to be done for 'all'.
|
74 |
+
~/huggingface/TinyLLama-v0-5M-F16-llamafile
|
75 |
== What is our llamafile name going to be? ==
|
76 |
+
We will be aiming to generate Tinyllama-5M-v0.2-F16.llamafile
|
77 |
== Convert from safetensor to gguf ==
|
78 |
+
INFO:hf-to-gguf:Loading model: maykeye_tinyllama
|
79 |
+
INFO:hf-to-gguf:gguf: loading model part 'model.safetensors'
|
80 |
+
INFO:gguf.gguf_writer:gguf: Will write to maykeye_tinyllama/Tinyllama-5M-v0.2-F16.gguf
|
|
|
|
|
|
|
|
|
|
|
81 |
INFO:gguf.gguf_writer:gguf: This GGUF file is for Little Endian only
|
82 |
+
INFO:hf-to-gguf:Set meta model
|
83 |
+
INFO:hf-to-gguf:Set model parameters
|
84 |
+
INFO:hf-to-gguf:gguf: context length = 2048
|
85 |
+
INFO:hf-to-gguf:gguf: embedding length = 64
|
86 |
+
INFO:hf-to-gguf:gguf: feed forward length = 256
|
87 |
+
INFO:hf-to-gguf:gguf: head count = 16
|
88 |
+
INFO:hf-to-gguf:gguf: rms norm epsilon = 1e-06
|
89 |
+
INFO:hf-to-gguf:gguf: file type = 1
|
90 |
+
INFO:hf-to-gguf:Set model tokenizer
|
91 |
INFO:gguf.vocab:Setting special token type bos to 1
|
92 |
INFO:gguf.vocab:Setting special token type eos to 2
|
93 |
INFO:gguf.vocab:Setting special token type unk to 0
|
94 |
INFO:gguf.vocab:Setting special token type pad to 0
|
95 |
+
INFO:hf-to-gguf:Exporting model to 'maykeye_tinyllama/Tinyllama-5M-v0.2-F16.gguf'
|
96 |
+
INFO:hf-to-gguf:gguf: loading model part 'model.safetensors'
|
97 |
+
INFO:hf-to-gguf:output.weight, torch.bfloat16 --> F16, shape = {64, 32000}
|
98 |
+
INFO:hf-to-gguf:token_embd.weight, torch.bfloat16 --> F16, shape = {64, 32000}
|
99 |
+
INFO:hf-to-gguf:blk.0.attn_norm.weight, torch.bfloat16 --> F32, shape = {64}
|
100 |
+
INFO:hf-to-gguf:blk.0.ffn_down.weight, torch.bfloat16 --> F16, shape = {256, 64}
|
101 |
+
INFO:hf-to-gguf:blk.0.ffn_gate.weight, torch.bfloat16 --> F16, shape = {64, 256}
|
102 |
+
INFO:hf-to-gguf:blk.0.ffn_up.weight, torch.bfloat16 --> F16, shape = {64, 256}
|
103 |
+
INFO:hf-to-gguf:blk.0.ffn_norm.weight, torch.bfloat16 --> F32, shape = {64}
|
104 |
+
INFO:hf-to-gguf:blk.0.attn_k.weight, torch.bfloat16 --> F16, shape = {64, 64}
|
105 |
+
INFO:hf-to-gguf:blk.0.attn_output.weight, torch.bfloat16 --> F16, shape = {64, 64}
|
106 |
+
INFO:hf-to-gguf:blk.0.attn_q.weight, torch.bfloat16 --> F16, shape = {64, 64}
|
107 |
+
INFO:hf-to-gguf:blk.0.attn_v.weight, torch.bfloat16 --> F16, shape = {64, 64}
|
108 |
+
INFO:hf-to-gguf:blk.1.attn_norm.weight, torch.bfloat16 --> F32, shape = {64}
|
109 |
+
INFO:hf-to-gguf:blk.1.ffn_down.weight, torch.bfloat16 --> F16, shape = {256, 64}
|
110 |
+
INFO:hf-to-gguf:blk.1.ffn_gate.weight, torch.bfloat16 --> F16, shape = {64, 256}
|
111 |
+
INFO:hf-to-gguf:blk.1.ffn_up.weight, torch.bfloat16 --> F16, shape = {64, 256}
|
112 |
+
INFO:hf-to-gguf:blk.1.ffn_norm.weight, torch.bfloat16 --> F32, shape = {64}
|
113 |
+
INFO:hf-to-gguf:blk.1.attn_k.weight, torch.bfloat16 --> F16, shape = {64, 64}
|
114 |
+
INFO:hf-to-gguf:blk.1.attn_output.weight, torch.bfloat16 --> F16, shape = {64, 64}
|
115 |
+
INFO:hf-to-gguf:blk.1.attn_q.weight, torch.bfloat16 --> F16, shape = {64, 64}
|
116 |
+
INFO:hf-to-gguf:blk.1.attn_v.weight, torch.bfloat16 --> F16, shape = {64, 64}
|
117 |
+
INFO:hf-to-gguf:blk.2.attn_norm.weight, torch.bfloat16 --> F32, shape = {64}
|
118 |
+
INFO:hf-to-gguf:blk.2.ffn_down.weight, torch.bfloat16 --> F16, shape = {256, 64}
|
119 |
+
INFO:hf-to-gguf:blk.2.ffn_gate.weight, torch.bfloat16 --> F16, shape = {64, 256}
|
120 |
+
INFO:hf-to-gguf:blk.2.ffn_up.weight, torch.bfloat16 --> F16, shape = {64, 256}
|
121 |
+
INFO:hf-to-gguf:blk.2.ffn_norm.weight, torch.bfloat16 --> F32, shape = {64}
|
122 |
+
INFO:hf-to-gguf:blk.2.attn_k.weight, torch.bfloat16 --> F16, shape = {64, 64}
|
123 |
+
INFO:hf-to-gguf:blk.2.attn_output.weight, torch.bfloat16 --> F16, shape = {64, 64}
|
124 |
+
INFO:hf-to-gguf:blk.2.attn_q.weight, torch.bfloat16 --> F16, shape = {64, 64}
|
125 |
+
INFO:hf-to-gguf:blk.2.attn_v.weight, torch.bfloat16 --> F16, shape = {64, 64}
|
126 |
+
INFO:hf-to-gguf:blk.3.attn_norm.weight, torch.bfloat16 --> F32, shape = {64}
|
127 |
+
INFO:hf-to-gguf:blk.3.ffn_down.weight, torch.bfloat16 --> F16, shape = {256, 64}
|
128 |
+
INFO:hf-to-gguf:blk.3.ffn_gate.weight, torch.bfloat16 --> F16, shape = {64, 256}
|
129 |
+
INFO:hf-to-gguf:blk.3.ffn_up.weight, torch.bfloat16 --> F16, shape = {64, 256}
|
130 |
+
INFO:hf-to-gguf:blk.3.ffn_norm.weight, torch.bfloat16 --> F32, shape = {64}
|
131 |
+
INFO:hf-to-gguf:blk.3.attn_k.weight, torch.bfloat16 --> F16, shape = {64, 64}
|
132 |
+
INFO:hf-to-gguf:blk.3.attn_output.weight, torch.bfloat16 --> F16, shape = {64, 64}
|
133 |
+
INFO:hf-to-gguf:blk.3.attn_q.weight, torch.bfloat16 --> F16, shape = {64, 64}
|
134 |
+
INFO:hf-to-gguf:blk.3.attn_v.weight, torch.bfloat16 --> F16, shape = {64, 64}
|
135 |
+
INFO:hf-to-gguf:blk.4.attn_norm.weight, torch.bfloat16 --> F32, shape = {64}
|
136 |
+
INFO:hf-to-gguf:blk.4.ffn_down.weight, torch.bfloat16 --> F16, shape = {256, 64}
|
137 |
+
INFO:hf-to-gguf:blk.4.ffn_gate.weight, torch.bfloat16 --> F16, shape = {64, 256}
|
138 |
+
INFO:hf-to-gguf:blk.4.ffn_up.weight, torch.bfloat16 --> F16, shape = {64, 256}
|
139 |
+
INFO:hf-to-gguf:blk.4.ffn_norm.weight, torch.bfloat16 --> F32, shape = {64}
|
140 |
+
INFO:hf-to-gguf:blk.4.attn_k.weight, torch.bfloat16 --> F16, shape = {64, 64}
|
141 |
+
INFO:hf-to-gguf:blk.4.attn_output.weight, torch.bfloat16 --> F16, shape = {64, 64}
|
142 |
+
INFO:hf-to-gguf:blk.4.attn_q.weight, torch.bfloat16 --> F16, shape = {64, 64}
|
143 |
+
INFO:hf-to-gguf:blk.4.attn_v.weight, torch.bfloat16 --> F16, shape = {64, 64}
|
144 |
+
INFO:hf-to-gguf:blk.5.attn_norm.weight, torch.bfloat16 --> F32, shape = {64}
|
145 |
+
INFO:hf-to-gguf:blk.5.ffn_down.weight, torch.bfloat16 --> F16, shape = {256, 64}
|
146 |
+
INFO:hf-to-gguf:blk.5.ffn_gate.weight, torch.bfloat16 --> F16, shape = {64, 256}
|
147 |
+
INFO:hf-to-gguf:blk.5.ffn_up.weight, torch.bfloat16 --> F16, shape = {64, 256}
|
148 |
+
INFO:hf-to-gguf:blk.5.ffn_norm.weight, torch.bfloat16 --> F32, shape = {64}
|
149 |
+
INFO:hf-to-gguf:blk.5.attn_k.weight, torch.bfloat16 --> F16, shape = {64, 64}
|
150 |
+
INFO:hf-to-gguf:blk.5.attn_output.weight, torch.bfloat16 --> F16, shape = {64, 64}
|
151 |
+
INFO:hf-to-gguf:blk.5.attn_q.weight, torch.bfloat16 --> F16, shape = {64, 64}
|
152 |
+
INFO:hf-to-gguf:blk.5.attn_v.weight, torch.bfloat16 --> F16, shape = {64, 64}
|
153 |
+
INFO:hf-to-gguf:blk.6.attn_norm.weight, torch.bfloat16 --> F32, shape = {64}
|
154 |
+
INFO:hf-to-gguf:blk.6.ffn_down.weight, torch.bfloat16 --> F16, shape = {256, 64}
|
155 |
+
INFO:hf-to-gguf:blk.6.ffn_gate.weight, torch.bfloat16 --> F16, shape = {64, 256}
|
156 |
+
INFO:hf-to-gguf:blk.6.ffn_up.weight, torch.bfloat16 --> F16, shape = {64, 256}
|
157 |
+
INFO:hf-to-gguf:blk.6.ffn_norm.weight, torch.bfloat16 --> F32, shape = {64}
|
158 |
+
INFO:hf-to-gguf:blk.6.attn_k.weight, torch.bfloat16 --> F16, shape = {64, 64}
|
159 |
+
INFO:hf-to-gguf:blk.6.attn_output.weight, torch.bfloat16 --> F16, shape = {64, 64}
|
160 |
+
INFO:hf-to-gguf:blk.6.attn_q.weight, torch.bfloat16 --> F16, shape = {64, 64}
|
161 |
+
INFO:hf-to-gguf:blk.6.attn_v.weight, torch.bfloat16 --> F16, shape = {64, 64}
|
162 |
+
INFO:hf-to-gguf:blk.7.attn_norm.weight, torch.bfloat16 --> F32, shape = {64}
|
163 |
+
INFO:hf-to-gguf:blk.7.ffn_down.weight, torch.bfloat16 --> F16, shape = {256, 64}
|
164 |
+
INFO:hf-to-gguf:blk.7.ffn_gate.weight, torch.bfloat16 --> F16, shape = {64, 256}
|
165 |
+
INFO:hf-to-gguf:blk.7.ffn_up.weight, torch.bfloat16 --> F16, shape = {64, 256}
|
166 |
+
INFO:hf-to-gguf:blk.7.ffn_norm.weight, torch.bfloat16 --> F32, shape = {64}
|
167 |
+
INFO:hf-to-gguf:blk.7.attn_k.weight, torch.bfloat16 --> F16, shape = {64, 64}
|
168 |
+
INFO:hf-to-gguf:blk.7.attn_output.weight, torch.bfloat16 --> F16, shape = {64, 64}
|
169 |
+
INFO:hf-to-gguf:blk.7.attn_q.weight, torch.bfloat16 --> F16, shape = {64, 64}
|
170 |
+
INFO:hf-to-gguf:blk.7.attn_v.weight, torch.bfloat16 --> F16, shape = {64, 64}
|
171 |
+
INFO:hf-to-gguf:output_norm.weight, torch.bfloat16 --> F32, shape = {64}
|
172 |
+
Writing: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 9.24M/9.24M [00:00<00:00, 139Mbyte/s]
|
173 |
+
INFO:hf-to-gguf:Model successfully exported to 'maykeye_tinyllama/Tinyllama-5M-v0.2-F16.gguf'
|
174 |
== Generating Llamafile ==
|
175 |
== Test Output ==
|
176 |
note: if you have an AMD or NVIDIA GPU then you need to pass -ngl 9999 to enable GPU offloading
|
177 |
+
main: llamafile version 0.8.6
|
178 |
+
main: seed = 1717436617
|
179 |
+
llama_model_loader: loaded meta data with 29 key-value pairs and 75 tensors from Tinyllama-5M-v0.2-F16.gguf (version GGUF V3 (latest))
|
180 |
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
|
181 |
llama_model_loader: - kv 0: general.architecture str = llama
|
182 |
llama_model_loader: - kv 1: general.name str = TinyLLama
|
183 |
llama_model_loader: - kv 2: general.author str = mofosyne
|
184 |
+
llama_model_loader: - kv 3: general.version str = v0.2
|
185 |
llama_model_loader: - kv 4: general.url str = https://huggingface.co/mofosyne/TinyL...
|
186 |
llama_model_loader: - kv 5: general.description str = This gguf is ported from a first vers...
|
187 |
+
llama_model_loader: - kv 6: general.license str = apache-2.0
|
188 |
+
llama_model_loader: - kv 7: general.source.url str = https://huggingface.co/Maykeye/TinyLL...
|
189 |
+
llama_model_loader: - kv 8: general.source.huggingface.repository str = Maykeye/TinyLLama-v0
|
190 |
+
llama_model_loader: - kv 9: general.parameter_weight_class str = 5M
|
191 |
+
llama_model_loader: - kv 10: llama.block_count u32 = 8
|
192 |
+
llama_model_loader: - kv 11: llama.context_length u32 = 2048
|
193 |
+
llama_model_loader: - kv 12: llama.embedding_length u32 = 64
|
194 |
+
llama_model_loader: - kv 13: llama.feed_forward_length u32 = 256
|
195 |
llama_model_loader: - kv 14: llama.attention.head_count u32 = 16
|
196 |
+
llama_model_loader: - kv 15: llama.attention.layer_norm_rms_epsilon f32 = 0.000001
|
197 |
+
llama_model_loader: - kv 16: general.file_type u32 = 1
|
198 |
+
llama_model_loader: - kv 17: llama.vocab_size u32 = 32000
|
199 |
+
llama_model_loader: - kv 18: llama.rope.dimension_count u32 = 4
|
200 |
+
llama_model_loader: - kv 19: tokenizer.ggml.model str = llama
|
201 |
+
llama_model_loader: - kv 20: tokenizer.ggml.pre str = default
|
202 |
+
llama_model_loader: - kv 21: tokenizer.ggml.tokens arr[str,32000] = ["<unk>", "<s>", "</s>", "<0x00>", "<...
|
203 |
+
llama_model_loader: - kv 22: tokenizer.ggml.scores arr[f32,32000] = [0.000000, 0.000000, 0.000000, 0.0000...
|
204 |
+
llama_model_loader: - kv 23: tokenizer.ggml.token_type arr[i32,32000] = [2, 3, 3, 6, 6, 6, 6, 6, 6, 6, 6, 6, ...
|
205 |
+
llama_model_loader: - kv 24: tokenizer.ggml.bos_token_id u32 = 1
|
206 |
+
llama_model_loader: - kv 25: tokenizer.ggml.eos_token_id u32 = 2
|
207 |
+
llama_model_loader: - kv 26: tokenizer.ggml.unknown_token_id u32 = 0
|
208 |
+
llama_model_loader: - kv 27: tokenizer.ggml.padding_token_id u32 = 0
|
209 |
+
llama_model_loader: - kv 28: general.quantization_version u32 = 2
|
210 |
llama_model_loader: - type f32: 17 tensors
|
211 |
llama_model_loader: - type f16: 58 tensors
|
212 |
llm_load_vocab: special tokens definition check successful ( 259/32000 ).
|
|
|
272 |
llama_new_context_with_model: graph nodes = 262
|
273 |
llama_new_context_with_model: graph splits = 1
|
274 |
|
275 |
+
system_info: n_threads = 4 / 8 | AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | AVX512_BF16 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 |
|
276 |
sampling:
|
277 |
repeat_last_n = 64, repeat_penalty = 1.000, frequency_penalty = 0.000, presence_penalty = 0.000
|
278 |
top_k = 40, tfs_z = 1.000, top_p = 0.950, min_p = 0.050, typical_p = 1.000, temp = 0.800
|
|
|
282 |
generate: n_ctx = 512, n_batch = 2048, n_predict = -1, n_keep = 1
|
283 |
|
284 |
|
285 |
+
hello world the gruff man said yes. She was very happy. The man waved goodbye to the little boy and said the little boy. It was the best day ever.
|
286 |
+
The little boy was so excited. He took off his special favorite toy and a beautiful dress. He gave it to the little boy and said "thank you" to the little girl. He said "Thank you for being so clever. The man and the little boy both smiled. [end of text]
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
287 |
|
288 |
|
289 |
+
llama_print_timings: load time = 9.88 ms
|
290 |
+
llama_print_timings: sample time = 3.83 ms / 89 runs ( 0.04 ms per token, 23249.74 tokens per second)
|
291 |
+
llama_print_timings: prompt eval time = 1.61 ms / 8 tokens ( 0.20 ms per token, 4968.94 tokens per second)
|
292 |
+
llama_print_timings: eval time = 214.13 ms / 88 runs ( 2.43 ms per token, 410.96 tokens per second)
|
293 |
+
llama_print_timings: total time = 237.74 ms / 96 tokens
|
294 |
Log end
|
295 |
```
|