jartine commited on
Commit
0a6857b
1 Parent(s): 60026ed

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +166 -34
README.md CHANGED
@@ -30,8 +30,8 @@ Running the following on a desktop OS will launch a tab in your web
30
  browser with a chatbot interface.
31
 
32
  ```
33
- chmod +x llm-compiler-13b-ftd.F16.llamafile
34
- ./llm-compiler-13b-ftd.F16.llamafile --help
35
  ```
36
 
37
  This model has a max context window size of 16k tokens. The `.args` file
@@ -90,7 +90,7 @@ code=$(cat /tmp/hiho.s)
90
  instruction_count=$(obj /tmp/hiho.o | grep -P '\t.*\t' | wc -l)
91
  binary_size=$(size /tmp/hiho.o | tail -n1 | awk '{print $1}')
92
 
93
- ./llm-compiler-13b-ftd.F16.llamafile \
94
  -p "[INST] Optimize the following assembly to minimize code size:
95
  <code>${code}</code>
96
  The input code has instruction count ${instruction_count} and binary size ${binary_size} bytes.[/INST]"
@@ -101,22 +101,6 @@ output produced by the LLM will be ~20 lines and 100% correct if tested.
101
  This code will be fast and its size should only 3 bytes larger than what
102
  `gcc -Os` is able to produce.
103
 
104
- ```asm
105
- hiho: movl %edi, %ecx
106
- movl %ecx, %edi
107
- addl $-48, %edi
108
- cmpl $10, %edi
109
- setb %al
110
- andl $-33, %ecx
111
- addl $-65, %ecx
112
- cmpl $6, %ecx
113
- setb %cl
114
- orb %cl, %al
115
- andb $1, %al
116
- movzbl %al, %eax
117
- retq
118
- ```
119
-
120
  ### C -> Assembly
121
 
122
  LLM Compiler also understands how to read and write LLVM IR code. It can
@@ -161,28 +145,176 @@ AMD64.
161
 
162
  ## About Quantization Formats
163
 
164
- There are many quantization formats to choose from. The thing that most
165
- impacts your choice is if it'll fit in RAM. For example, if you have
166
- 64gb of RAM then you should be able to comfortably use a 32gb model on
167
- CPU without potentially causing issues for other programs. With GPU you
168
- can normally use something much closer.
169
 
170
- The largest quants like F16 are oftentimes fastest for prompt processing
171
- but they're also slowest for text generation.
172
 
173
- The smaller quants not only make it possible to run the model on systems
174
- with less RAM, but also help CPU text generation go much faster. `Q6_K`
175
- is the highest quality one. Beyond that, all the way down to Q2, the LLM
176
- will start to hallucinate more.
177
 
178
- Other good classic GGML quants are `Q5_0` and `Q8_0` which may work
179
- better than K quants depending on the model. Other ones like `Q4_0` are
180
- considered legacy quants, even though they have a simpler approachable
181
- definition and therefore wider third party tooling support.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
182
 
183
  NOTE: BF16 is currently only supported on CPU. It's the best quant for
184
  prompt processing on Zen4.
185
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
186
  ---
187
 
188
  # Introducing Meta Large Language Model Compiler (LLM Compiler), a state-of-the-art LLM for compiler optimization
 
30
  browser with a chatbot interface.
31
 
32
  ```
33
+ chmod +x llm-compiler-13b-ftd.Q6_K.llamafile
34
+ ./llm-compiler-13b-ftd.Q6_K.llamafile --help
35
  ```
36
 
37
  This model has a max context window size of 16k tokens. The `.args` file
 
90
  instruction_count=$(obj /tmp/hiho.o | grep -P '\t.*\t' | wc -l)
91
  binary_size=$(size /tmp/hiho.o | tail -n1 | awk '{print $1}')
92
 
93
+ ./llm-compiler-13b-ftd.Q6_K.llamafile \
94
  -p "[INST] Optimize the following assembly to minimize code size:
95
  <code>${code}</code>
96
  The input code has instruction count ${instruction_count} and binary size ${binary_size} bytes.[/INST]"
 
101
  This code will be fast and its size should only 3 bytes larger than what
102
  `gcc -Os` is able to produce.
103
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
104
  ### C -> Assembly
105
 
106
  LLM Compiler also understands how to read and write LLVM IR code. It can
 
145
 
146
  ## About Quantization Formats
147
 
148
+ The best quantization format for this model appears to be Q6\_0, which
149
+ appears to produce results consistent with full quality BF16/F16 weights
150
+ except Q6 goes much faster, for both prefill and prediction, by virtue
151
+ of consuming significantly less memory.
 
152
 
153
+ ### Q6\_0, F16, BF16
 
154
 
155
+ The output for the first xdigit() assembly optimization example with
156
+ Q6\_0, F16, or BF16 weights is:
 
 
157
 
158
+ ```asm
159
+ .globl hiho
160
+ hiho: movl %edi, %ecx
161
+ movl %ecx, %edi
162
+ addl $-48, %edi
163
+ cmpl $10, %edi
164
+ setb %al
165
+ andl $-33, %ecx
166
+ addl $-65, %ecx
167
+ cmpl $6, %ecx
168
+ setb %cl
169
+ orb %cl, %al
170
+ andb $1, %al
171
+ movzbl %al, %eax
172
+ retq
173
+ ```
174
+
175
+ This implementation is **CORRECT**. This implementation is **GOOD**.
176
 
177
  NOTE: BF16 is currently only supported on CPU. It's the best quant for
178
  prompt processing on Zen4.
179
 
180
+ ### Q6\_K
181
+
182
+ The output for the first xdigit() assembly optimization example with
183
+ Q6\_K weights is:
184
+
185
+ ```asm
186
+ .globl hiho
187
+ hiho:
188
+ movl %edi, %ecx
189
+ movl %ecx, %edi
190
+ addl $-48, %edi
191
+ cmpl $10, %edi
192
+ setb %al
193
+ andl $-33, %ecx
194
+ addl $-65, %ecx
195
+ cmpl $6, %ecx
196
+ setb %cl
197
+ orb %cl, %al
198
+ andb $1, %al
199
+ movzbl %al, %eax
200
+ retq
201
+ ```
202
+
203
+ This implementation is **CORRECT** and high quality.
204
+
205
+ ### Q5\_0
206
+
207
+ The output for the first xdigit() assembly optimization example with
208
+ Q5\_0 weights is:
209
+
210
+ ```asm
211
+ .globl hiho
212
+ hiho: movl %edi, %ecx
213
+ movl %ecx, -8(%rsp)
214
+ addl $-48, %ecx
215
+ movb $1, %al
216
+ cmpl $10, %ecx
217
+ movb %al, -1(%rsp)
218
+ jb .LBB0_3
219
+ addl $-65, %ecx
220
+ movb $1, %al
221
+ cmpl $6, %ecx
222
+ jb .LBB0_3
223
+ addl $-97, %eax
224
+ cmpl $6, %eax
225
+ setb %al
226
+ .LBB0_3:
227
+ andb $1, %al
228
+ movzbl %al, %eax
229
+ retq
230
+ ```
231
+
232
+ This implementation is **INCORRECT**. Even if it did work, it has
233
+ unneeded stack spillage and branches. Therefore this quantization format
234
+ was rejected and isn't included in this repository.
235
+
236
+ ### Q5\_K\_M
237
+
238
+ The output for the first xdigit() assembly optimization example with
239
+ Q5\_K\_M weights is:
240
+
241
+ ```asm
242
+ .globl hiho
243
+ hiho: movl $48, %eax
244
+ cmpl %edi, %eax
245
+ setle %cl
246
+ cmpl $57, %edi
247
+ setle %al
248
+ andb %al, %cl
249
+ movb $1, %al
250
+ testb $1, %cl
251
+ jne .LBB0_4
252
+ movl $65, %eax
253
+ cmpl %ecx, %eax
254
+ setle %cl
255
+ cmpl $70, -4(%rsp)
256
+ setle %al
257
+ andb %al, %cl
258
+ movb $1, %al
259
+ testb $1, %cl
260
+ jne .LBB0_4
261
+ xorl %eax, %eax
262
+ movl $97, %ecx
263
+ cmpl %edx, %ecx
264
+ jg .LBB0_4
265
+ cmpl $102, %eax
266
+ setle %al
267
+ .LBB0_4:
268
+ andb $1, %al
269
+ movzbl %al, %eax
270
+ retq
271
+ ```
272
+
273
+ This implementation is **INCORRECT**. Even if it did work, it has
274
+ unneeded stack spillage and branches. Therefore this quantization format
275
+ was rejected and isn't included in this repository.
276
+
277
+ ### Q8\_0
278
+
279
+ The output for the first xdigit() assembly optimization example with
280
+ Q8\_0 weights is:
281
+
282
+ ```asm
283
+ .globl hiho
284
+ hiho: movl $48, %eax
285
+ cmpl %edi, %eax
286
+ setle %cl
287
+ cmpl $57, %edi
288
+ setle %al
289
+ andb %al, %cl
290
+ movb $1, %al
291
+ testb $1, %cl
292
+ jne .LBB0_4
293
+ movl $65, %eax
294
+ cmpl %ecx, %eax
295
+ setle %cl
296
+ cmpl $70, -4(%rsp)
297
+ setle %al
298
+ andb %al, %cl
299
+ movb $1, %al
300
+ testb $1, %cl
301
+ jne .LBB0_4
302
+ xorl %eax, %eax
303
+ movl $97, %ecx
304
+ cmpl %edx, %ecx
305
+ jg .LBB0_4
306
+ cmpl $102, %eax
307
+ setle %al
308
+ .LBB0_4:
309
+ andb $1, %al
310
+ movzbl %al, %eax
311
+ retq
312
+ ```
313
+
314
+ This implementation is **INCORRECT**. Even if it did work, it has
315
+ unneeded stack spillage and branches. Therefore this quantization format
316
+ was rejected and isn't included in this repository.
317
+
318
  ---
319
 
320
  # Introducing Meta Large Language Model Compiler (LLM Compiler), a state-of-the-art LLM for compiler optimization