readme update
Browse files
README.md
CHANGED
@@ -62,4 +62,233 @@ llamafile is a new format introduced by Mozilla Ocho on Nov 20th 2023. It uses C
|
|
62 |
|
63 |
## Replication Steps
|
64 |
|
65 |
-
For the most current replication steps, refer to the bash script `llamafile-creation.sh` in this repo
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
62 |
|
63 |
## Replication Steps
|
64 |
|
65 |
+
For the most current replication steps, refer to the bash script `llamafile-creation.sh` in this repo.
|
66 |
+
|
67 |
+
```
|
68 |
+
$ llamafile-creation.sh
|
69 |
+
llamafile-creation.sh: command not found
|
70 |
+
mofosyne@mofosyne-Z97MX-Gaming-5:~/huggingface/TinyLLama-v0-llamafile$ ./llamafile-creation.sh
|
71 |
+
== Prep Enviroment ==
|
72 |
+
== Build and prep the llamafile engine execuable ==
|
73 |
+
~/huggingface/TinyLLama-v0-llamafile/llamafile ~/huggingface/TinyLLama-v0-llamafile
|
74 |
+
make: Nothing to be done for 'all'.
|
75 |
+
make: Nothing to be done for 'all'.
|
76 |
+
~/huggingface/TinyLLama-v0-llamafile
|
77 |
+
== What is our llamafile name going to be? ==
|
78 |
+
We will be aiming to generate TinyLLama-v0-5M-F16.llamafile
|
79 |
+
== Convert from safetensor to gguf ==
|
80 |
+
INFO:convert:Loading model file maykeye_tinyllama/model.safetensors
|
81 |
+
INFO:convert:model parameters count : 4621392 (5M)
|
82 |
+
INFO:convert:params = Params(n_vocab=32000, n_embd=64, n_layer=8, n_ctx=2048, n_ff=256, n_head=16, n_head_kv=16, n_experts=None, n_experts_used=None, f_norm_eps=1e-06, rope_scaling_type=None, f_rope_freq_base=None, f_rope_scale=None, n_orig_ctx=None, rope_finetuned=None, ftype=<GGMLFileType.MostlyF16: 1>, path_model=PosixPath('maykeye_tinyllama'))
|
83 |
+
INFO:convert:Loaded vocab file PosixPath('maykeye_tinyllama/tokenizer.model'), type 'spm'
|
84 |
+
INFO:convert:Vocab info: <SentencePieceVocab with 32000 base tokens and 0 added tokens>
|
85 |
+
INFO:convert:Special vocab info: <SpecialVocab with 0 merges, special tokens {'bos': 1, 'eos': 2, 'unk': 0, 'pad': 0}, add special tokens unset>
|
86 |
+
INFO:convert:Writing maykeye_tinyllama/TinyLLama-v0-5M-F16.gguf, format 1
|
87 |
+
WARNING:convert:Ignoring added_tokens.json since model matches vocab size without it.
|
88 |
+
INFO:gguf.gguf_writer:gguf: This GGUF file is for Little Endian only
|
89 |
+
INFO:gguf.vocab:Setting special token type bos to 1
|
90 |
+
INFO:gguf.vocab:Setting special token type eos to 2
|
91 |
+
INFO:gguf.vocab:Setting special token type unk to 0
|
92 |
+
INFO:gguf.vocab:Setting special token type pad to 0
|
93 |
+
INFO:convert:[ 1/75] Writing tensor output.weight | size 32000 x 64 | type F16 | T+ 0
|
94 |
+
INFO:convert:[ 2/75] Writing tensor token_embd.weight | size 32000 x 64 | type F16 | T+ 0
|
95 |
+
INFO:convert:[ 3/75] Writing tensor blk.0.attn_norm.weight | size 64 | type F32 | T+ 0
|
96 |
+
INFO:convert:[ 4/75] Writing tensor blk.0.ffn_down.weight | size 64 x 256 | type F16 | T+ 0
|
97 |
+
INFO:convert:[ 5/75] Writing tensor blk.0.ffn_gate.weight | size 256 x 64 | type F16 | T+ 0
|
98 |
+
INFO:convert:[ 6/75] Writing tensor blk.0.ffn_up.weight | size 256 x 64 | type F16 | T+ 0
|
99 |
+
INFO:convert:[ 7/75] Writing tensor blk.0.ffn_norm.weight | size 64 | type F32 | T+ 0
|
100 |
+
INFO:convert:[ 8/75] Writing tensor blk.0.attn_k.weight | size 64 x 64 | type F16 | T+ 0
|
101 |
+
INFO:convert:[ 9/75] Writing tensor blk.0.attn_output.weight | size 64 x 64 | type F16 | T+ 0
|
102 |
+
INFO:convert:[10/75] Writing tensor blk.0.attn_q.weight | size 64 x 64 | type F16 | T+ 0
|
103 |
+
INFO:convert:[11/75] Writing tensor blk.0.attn_v.weight | size 64 x 64 | type F16 | T+ 0
|
104 |
+
INFO:convert:[12/75] Writing tensor blk.1.attn_norm.weight | size 64 | type F32 | T+ 0
|
105 |
+
INFO:convert:[13/75] Writing tensor blk.1.ffn_down.weight | size 64 x 256 | type F16 | T+ 0
|
106 |
+
INFO:convert:[14/75] Writing tensor blk.1.ffn_gate.weight | size 256 x 64 | type F16 | T+ 0
|
107 |
+
INFO:convert:[15/75] Writing tensor blk.1.ffn_up.weight | size 256 x 64 | type F16 | T+ 0
|
108 |
+
INFO:convert:[16/75] Writing tensor blk.1.ffn_norm.weight | size 64 | type F32 | T+ 0
|
109 |
+
INFO:convert:[17/75] Writing tensor blk.1.attn_k.weight | size 64 x 64 | type F16 | T+ 0
|
110 |
+
INFO:convert:[18/75] Writing tensor blk.1.attn_output.weight | size 64 x 64 | type F16 | T+ 0
|
111 |
+
INFO:convert:[19/75] Writing tensor blk.1.attn_q.weight | size 64 x 64 | type F16 | T+ 0
|
112 |
+
INFO:convert:[20/75] Writing tensor blk.1.attn_v.weight | size 64 x 64 | type F16 | T+ 0
|
113 |
+
INFO:convert:[21/75] Writing tensor blk.2.attn_norm.weight | size 64 | type F32 | T+ 0
|
114 |
+
INFO:convert:[22/75] Writing tensor blk.2.ffn_down.weight | size 64 x 256 | type F16 | T+ 0
|
115 |
+
INFO:convert:[23/75] Writing tensor blk.2.ffn_gate.weight | size 256 x 64 | type F16 | T+ 0
|
116 |
+
INFO:convert:[24/75] Writing tensor blk.2.ffn_up.weight | size 256 x 64 | type F16 | T+ 0
|
117 |
+
INFO:convert:[25/75] Writing tensor blk.2.ffn_norm.weight | size 64 | type F32 | T+ 0
|
118 |
+
INFO:convert:[26/75] Writing tensor blk.2.attn_k.weight | size 64 x 64 | type F16 | T+ 0
|
119 |
+
INFO:convert:[27/75] Writing tensor blk.2.attn_output.weight | size 64 x 64 | type F16 | T+ 0
|
120 |
+
INFO:convert:[28/75] Writing tensor blk.2.attn_q.weight | size 64 x 64 | type F16 | T+ 0
|
121 |
+
INFO:convert:[29/75] Writing tensor blk.2.attn_v.weight | size 64 x 64 | type F16 | T+ 0
|
122 |
+
INFO:convert:[30/75] Writing tensor blk.3.attn_norm.weight | size 64 | type F32 | T+ 0
|
123 |
+
INFO:convert:[31/75] Writing tensor blk.3.ffn_down.weight | size 64 x 256 | type F16 | T+ 0
|
124 |
+
INFO:convert:[32/75] Writing tensor blk.3.ffn_gate.weight | size 256 x 64 | type F16 | T+ 0
|
125 |
+
INFO:convert:[33/75] Writing tensor blk.3.ffn_up.weight | size 256 x 64 | type F16 | T+ 0
|
126 |
+
INFO:convert:[34/75] Writing tensor blk.3.ffn_norm.weight | size 64 | type F32 | T+ 0
|
127 |
+
INFO:convert:[35/75] Writing tensor blk.3.attn_k.weight | size 64 x 64 | type F16 | T+ 0
|
128 |
+
INFO:convert:[36/75] Writing tensor blk.3.attn_output.weight | size 64 x 64 | type F16 | T+ 0
|
129 |
+
INFO:convert:[37/75] Writing tensor blk.3.attn_q.weight | size 64 x 64 | type F16 | T+ 0
|
130 |
+
INFO:convert:[38/75] Writing tensor blk.3.attn_v.weight | size 64 x 64 | type F16 | T+ 0
|
131 |
+
INFO:convert:[39/75] Writing tensor blk.4.attn_norm.weight | size 64 | type F32 | T+ 0
|
132 |
+
INFO:convert:[40/75] Writing tensor blk.4.ffn_down.weight | size 64 x 256 | type F16 | T+ 0
|
133 |
+
INFO:convert:[41/75] Writing tensor blk.4.ffn_gate.weight | size 256 x 64 | type F16 | T+ 0
|
134 |
+
INFO:convert:[42/75] Writing tensor blk.4.ffn_up.weight | size 256 x 64 | type F16 | T+ 0
|
135 |
+
INFO:convert:[43/75] Writing tensor blk.4.ffn_norm.weight | size 64 | type F32 | T+ 0
|
136 |
+
INFO:convert:[44/75] Writing tensor blk.4.attn_k.weight | size 64 x 64 | type F16 | T+ 0
|
137 |
+
INFO:convert:[45/75] Writing tensor blk.4.attn_output.weight | size 64 x 64 | type F16 | T+ 0
|
138 |
+
INFO:convert:[46/75] Writing tensor blk.4.attn_q.weight | size 64 x 64 | type F16 | T+ 0
|
139 |
+
INFO:convert:[47/75] Writing tensor blk.4.attn_v.weight | size 64 x 64 | type F16 | T+ 0
|
140 |
+
INFO:convert:[48/75] Writing tensor blk.5.attn_norm.weight | size 64 | type F32 | T+ 0
|
141 |
+
INFO:convert:[49/75] Writing tensor blk.5.ffn_down.weight | size 64 x 256 | type F16 | T+ 0
|
142 |
+
INFO:convert:[50/75] Writing tensor blk.5.ffn_gate.weight | size 256 x 64 | type F16 | T+ 0
|
143 |
+
INFO:convert:[51/75] Writing tensor blk.5.ffn_up.weight | size 256 x 64 | type F16 | T+ 0
|
144 |
+
INFO:convert:[52/75] Writing tensor blk.5.ffn_norm.weight | size 64 | type F32 | T+ 0
|
145 |
+
INFO:convert:[53/75] Writing tensor blk.5.attn_k.weight | size 64 x 64 | type F16 | T+ 0
|
146 |
+
INFO:convert:[54/75] Writing tensor blk.5.attn_output.weight | size 64 x 64 | type F16 | T+ 0
|
147 |
+
INFO:convert:[55/75] Writing tensor blk.5.attn_q.weight | size 64 x 64 | type F16 | T+ 0
|
148 |
+
INFO:convert:[56/75] Writing tensor blk.5.attn_v.weight | size 64 x 64 | type F16 | T+ 0
|
149 |
+
INFO:convert:[57/75] Writing tensor blk.6.attn_norm.weight | size 64 | type F32 | T+ 0
|
150 |
+
INFO:convert:[58/75] Writing tensor blk.6.ffn_down.weight | size 64 x 256 | type F16 | T+ 0
|
151 |
+
INFO:convert:[59/75] Writing tensor blk.6.ffn_gate.weight | size 256 x 64 | type F16 | T+ 0
|
152 |
+
INFO:convert:[60/75] Writing tensor blk.6.ffn_up.weight | size 256 x 64 | type F16 | T+ 0
|
153 |
+
INFO:convert:[61/75] Writing tensor blk.6.ffn_norm.weight | size 64 | type F32 | T+ 0
|
154 |
+
INFO:convert:[62/75] Writing tensor blk.6.attn_k.weight | size 64 x 64 | type F16 | T+ 0
|
155 |
+
INFO:convert:[63/75] Writing tensor blk.6.attn_output.weight | size 64 x 64 | type F16 | T+ 0
|
156 |
+
INFO:convert:[64/75] Writing tensor blk.6.attn_q.weight | size 64 x 64 | type F16 | T+ 0
|
157 |
+
INFO:convert:[65/75] Writing tensor blk.6.attn_v.weight | size 64 x 64 | type F16 | T+ 0
|
158 |
+
INFO:convert:[66/75] Writing tensor blk.7.attn_norm.weight | size 64 | type F32 | T+ 0
|
159 |
+
INFO:convert:[67/75] Writing tensor blk.7.ffn_down.weight | size 64 x 256 | type F16 | T+ 0
|
160 |
+
INFO:convert:[68/75] Writing tensor blk.7.ffn_gate.weight | size 256 x 64 | type F16 | T+ 0
|
161 |
+
INFO:convert:[69/75] Writing tensor blk.7.ffn_up.weight | size 256 x 64 | type F16 | T+ 0
|
162 |
+
INFO:convert:[70/75] Writing tensor blk.7.ffn_norm.weight | size 64 | type F32 | T+ 0
|
163 |
+
INFO:convert:[71/75] Writing tensor blk.7.attn_k.weight | size 64 x 64 | type F16 | T+ 0
|
164 |
+
INFO:convert:[72/75] Writing tensor blk.7.attn_output.weight | size 64 x 64 | type F16 | T+ 0
|
165 |
+
INFO:convert:[73/75] Writing tensor blk.7.attn_q.weight | size 64 x 64 | type F16 | T+ 0
|
166 |
+
INFO:convert:[74/75] Writing tensor blk.7.attn_v.weight | size 64 x 64 | type F16 | T+ 0
|
167 |
+
INFO:convert:[75/75] Writing tensor output_norm.weight | size 64 | type F32 | T+ 0
|
168 |
+
INFO:convert:Wrote maykeye_tinyllama/TinyLLama-v0-5M-F16.gguf
|
169 |
+
== Generating Llamafile ==
|
170 |
+
== Test Output ==
|
171 |
+
note: if you have an AMD or NVIDIA GPU then you need to pass -ngl 9999 to enable GPU offloading
|
172 |
+
main: llamafile version 0.8.4
|
173 |
+
main: seed = 1715571182
|
174 |
+
llama_model_loader: loaded meta data with 26 key-value pairs and 75 tensors from TinyLLama-v0-5M-F16.gguf (version GGUF V3 (latest))
|
175 |
+
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
|
176 |
+
llama_model_loader: - kv 0: general.architecture str = llama
|
177 |
+
llama_model_loader: - kv 1: general.name str = TinyLLama
|
178 |
+
llama_model_loader: - kv 2: general.author str = mofosyne
|
179 |
+
llama_model_loader: - kv 3: general.version str = v0
|
180 |
+
llama_model_loader: - kv 4: general.url str = https://huggingface.co/mofosyne/TinyL...
|
181 |
+
llama_model_loader: - kv 5: general.description str = This gguf is ported from a first vers...
|
182 |
+
llama_model_loader: - kv 6: general.source.url str = https://huggingface.co/Maykeye/TinyLL...
|
183 |
+
llama_model_loader: - kv 7: general.source.huggingface.repository str = https://huggingface.co/Maykeye/TinyLL...
|
184 |
+
llama_model_loader: - kv 8: llama.vocab_size u32 = 32000
|
185 |
+
llama_model_loader: - kv 9: llama.context_length u32 = 2048
|
186 |
+
llama_model_loader: - kv 10: llama.embedding_length u32 = 64
|
187 |
+
llama_model_loader: - kv 11: llama.block_count u32 = 8
|
188 |
+
llama_model_loader: - kv 12: llama.feed_forward_length u32 = 256
|
189 |
+
llama_model_loader: - kv 13: llama.rope.dimension_count u32 = 4
|
190 |
+
llama_model_loader: - kv 14: llama.attention.head_count u32 = 16
|
191 |
+
llama_model_loader: - kv 15: llama.attention.head_count_kv u32 = 16
|
192 |
+
llama_model_loader: - kv 16: llama.attention.layer_norm_rms_epsilon f32 = 0.000001
|
193 |
+
llama_model_loader: - kv 17: general.file_type u32 = 1
|
194 |
+
llama_model_loader: - kv 18: tokenizer.ggml.model str = llama
|
195 |
+
llama_model_loader: - kv 19: tokenizer.ggml.tokens arr[str,32000] = ["<unk>", "<s>", "</s>", "<0x00>", "<...
|
196 |
+
llama_model_loader: - kv 20: tokenizer.ggml.scores arr[f32,32000] = [0.000000, 0.000000, 0.000000, 0.0000...
|
197 |
+
llama_model_loader: - kv 21: tokenizer.ggml.token_type arr[i32,32000] = [2, 3, 3, 6, 6, 6, 6, 6, 6, 6, 6, 6, ...
|
198 |
+
llama_model_loader: - kv 22: tokenizer.ggml.bos_token_id u32 = 1
|
199 |
+
llama_model_loader: - kv 23: tokenizer.ggml.eos_token_id u32 = 2
|
200 |
+
llama_model_loader: - kv 24: tokenizer.ggml.unknown_token_id u32 = 0
|
201 |
+
llama_model_loader: - kv 25: tokenizer.ggml.padding_token_id u32 = 0
|
202 |
+
llama_model_loader: - type f32: 17 tensors
|
203 |
+
llama_model_loader: - type f16: 58 tensors
|
204 |
+
llm_load_vocab: special tokens definition check successful ( 259/32000 ).
|
205 |
+
llm_load_print_meta: format = GGUF V3 (latest)
|
206 |
+
llm_load_print_meta: arch = llama
|
207 |
+
llm_load_print_meta: vocab type = SPM
|
208 |
+
llm_load_print_meta: n_vocab = 32000
|
209 |
+
llm_load_print_meta: n_merges = 0
|
210 |
+
llm_load_print_meta: n_ctx_train = 2048
|
211 |
+
llm_load_print_meta: n_embd = 64
|
212 |
+
llm_load_print_meta: n_head = 16
|
213 |
+
llm_load_print_meta: n_head_kv = 16
|
214 |
+
llm_load_print_meta: n_layer = 8
|
215 |
+
llm_load_print_meta: n_rot = 4
|
216 |
+
llm_load_print_meta: n_embd_head_k = 4
|
217 |
+
llm_load_print_meta: n_embd_head_v = 4
|
218 |
+
llm_load_print_meta: n_gqa = 1
|
219 |
+
llm_load_print_meta: n_embd_k_gqa = 64
|
220 |
+
llm_load_print_meta: n_embd_v_gqa = 64
|
221 |
+
llm_load_print_meta: f_norm_eps = 0.0e+00
|
222 |
+
llm_load_print_meta: f_norm_rms_eps = 1.0e-06
|
223 |
+
llm_load_print_meta: f_clamp_kqv = 0.0e+00
|
224 |
+
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
|
225 |
+
llm_load_print_meta: f_logit_scale = 0.0e+00
|
226 |
+
llm_load_print_meta: n_ff = 256
|
227 |
+
llm_load_print_meta: n_expert = 0
|
228 |
+
llm_load_print_meta: n_expert_used = 0
|
229 |
+
llm_load_print_meta: causal attn = 1
|
230 |
+
llm_load_print_meta: pooling type = 0
|
231 |
+
llm_load_print_meta: rope type = 0
|
232 |
+
llm_load_print_meta: rope scaling = linear
|
233 |
+
llm_load_print_meta: freq_base_train = 10000.0
|
234 |
+
llm_load_print_meta: freq_scale_train = 1
|
235 |
+
llm_load_print_meta: n_yarn_orig_ctx = 2048
|
236 |
+
llm_load_print_meta: rope_finetuned = unknown
|
237 |
+
llm_load_print_meta: ssm_d_conv = 0
|
238 |
+
llm_load_print_meta: ssm_d_inner = 0
|
239 |
+
llm_load_print_meta: ssm_d_state = 0
|
240 |
+
llm_load_print_meta: ssm_dt_rank = 0
|
241 |
+
llm_load_print_meta: model type = ?B
|
242 |
+
llm_load_print_meta: model ftype = F16
|
243 |
+
llm_load_print_meta: model params = 4.62 M
|
244 |
+
llm_load_print_meta: model size = 8.82 MiB (16.00 BPW)
|
245 |
+
llm_load_print_meta: general.name = TinyLLama
|
246 |
+
llm_load_print_meta: BOS token = 1 '<s>'
|
247 |
+
llm_load_print_meta: EOS token = 2 '</s>'
|
248 |
+
llm_load_print_meta: UNK token = 0 '<unk>'
|
249 |
+
llm_load_print_meta: PAD token = 0 '<unk>'
|
250 |
+
llm_load_print_meta: LF token = 13 '<0x0A>'
|
251 |
+
llm_load_tensors: ggml ctx size = 0.04 MiB
|
252 |
+
llm_load_tensors: CPU buffer size = 8.82 MiB
|
253 |
+
..............
|
254 |
+
llama_new_context_with_model: n_ctx = 512
|
255 |
+
llama_new_context_with_model: n_batch = 512
|
256 |
+
llama_new_context_with_model: n_ubatch = 512
|
257 |
+
llama_new_context_with_model: flash_attn = 0
|
258 |
+
llama_new_context_with_model: freq_base = 10000.0
|
259 |
+
llama_new_context_with_model: freq_scale = 1
|
260 |
+
llama_kv_cache_init: CPU KV buffer size = 1.00 MiB
|
261 |
+
llama_new_context_with_model: KV self size = 1.00 MiB, K (f16): 0.50 MiB, V (f16): 0.50 MiB
|
262 |
+
llama_new_context_with_model: CPU output buffer size = 0.12 MiB
|
263 |
+
llama_new_context_with_model: CPU compute buffer size = 62.75 MiB
|
264 |
+
llama_new_context_with_model: graph nodes = 262
|
265 |
+
llama_new_context_with_model: graph splits = 1
|
266 |
+
|
267 |
+
system_info: n_threads = 4 / 8 | AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 |
|
268 |
+
sampling:
|
269 |
+
repeat_last_n = 64, repeat_penalty = 1.000, frequency_penalty = 0.000, presence_penalty = 0.000
|
270 |
+
top_k = 40, tfs_z = 1.000, top_p = 0.950, min_p = 0.050, typical_p = 1.000, temp = 0.800
|
271 |
+
mirostat = 0, mirostat_lr = 0.100, mirostat_ent = 5.000
|
272 |
+
sampling order:
|
273 |
+
CFG -> Penalties -> top_k -> tfs_z -> typical_p -> top_p -> min_p -> temperature
|
274 |
+
generate: n_ctx = 512, n_batch = 2048, n_predict = -1, n_keep = 1
|
275 |
+
|
276 |
+
|
277 |
+
<s> hello world the gruff man said no. The man was very sad and he was too scared to come back.
|
278 |
+
The man asked the man, "Why are you sad?"
|
279 |
+
The man said, "I am scared of a big surprise. I will help you."
|
280 |
+
The man looked at the boy and said, "I can help you. I can make the little boy's wings. The man makes the girl laugh. She was so kind and happy.
|
281 |
+
The boy said, "You are too mean to me. You can't give out the problem."
|
282 |
+
The girl said, "I will help you!"
|
283 |
+
The man stopped and said, "I can help you. I'm sorry for a little girl, but you must tell the boy to be careful. Do you want to be kind."
|
284 |
+
The boy smiled and said, "Yes, I want to help you. Let's go into the pond and have fun!"
|
285 |
+
The boy and the man went to the lake to the pond. They had a great time and the man was able to help.</s> [end of text]
|
286 |
+
|
287 |
+
|
288 |
+
llama_print_timings: load time = 7.35 ms
|
289 |
+
llama_print_timings: sample time = 8.40 ms / 218 runs ( 0.04 ms per token, 25958.56 tokens per second)
|
290 |
+
llama_print_timings: prompt eval time = 2.90 ms / 8 tokens ( 0.36 ms per token, 2760.52 tokens per second)
|
291 |
+
llama_print_timings: eval time = 372.10 ms / 217 runs ( 1.71 ms per token, 583.18 tokens per second)
|
292 |
+
llama_print_timings: total time = 427.19 ms / 225 tokens
|
293 |
+
Log end
|
294 |
+
```
|