Mozilla
/

llm-compiler-13b-ftd-llamafile

llamafile

English

Model card Files Files and versions Community

jartine commited on Jun 29

Commit

0a6857b

•

1 Parent(s): 60026ed

Update README.md

Browse files

Files changed (1) hide show

README.md +166 -34

README.md CHANGED Viewed

@@ -30,8 +30,8 @@ Running the following on a desktop OS will launch a tab in your web
 browser with a chatbot interface.
 ```
-chmod +x llm-compiler-13b-ftd.F16.llamafile
-./llm-compiler-13b-ftd.F16.llamafile --help
 ```
 This model has a max context window size of 16k tokens. The `.args` file
@@ -90,7 +90,7 @@ code=$(cat /tmp/hiho.s)
 instruction_count=$(obj /tmp/hiho.o | grep -P '\t.*\t' | wc -l)
 binary_size=$(size /tmp/hiho.o | tail -n1 | awk '{print $1}')
-./llm-compiler-13b-ftd.F16.llamafile \
 -p "[INST] Optimize the following assembly to minimize code size:
 <code>${code}</code>
 The input code has instruction count ${instruction_count} and binary size ${binary_size} bytes.[/INST]"
@@ -101,22 +101,6 @@ output produced by the LLM will be ~20 lines and 100% correct if tested.
 This code will be fast and its size should only 3 bytes larger than what
 `gcc -Os` is able to produce.
-```asm
-hiho:   movl    %edi, %ecx
-        movl    %ecx, %edi
-        addl    $-48, %edi
-        cmpl    $10, %edi
-        setb    %al
-        andl    $-33, %ecx
-        addl    $-65, %ecx
-        cmpl    $6, %ecx
-        setb    %cl
-        orb     %cl, %al
-        andb    $1, %al
-        movzbl  %al, %eax
-        retq
-```
 ### C -> Assembly
 LLM Compiler also understands how to read and write LLVM IR code. It can
@@ -161,28 +145,176 @@ AMD64.
 ## About Quantization Formats
-There are many quantization formats to choose from. The thing that most
-impacts your choice is if it'll fit in RAM. For example, if you have
-64gb of RAM then you should be able to comfortably use a 32gb model on
-CPU without potentially causing issues for other programs. With GPU you
-can normally use something much closer.
-The largest quants like F16 are oftentimes fastest for prompt processing
-but they're also slowest for text generation.
-The smaller quants not only make it possible to run the model on systems
-with less RAM, but also help CPU text generation go much faster. `Q6_K`
-is the highest quality one. Beyond that, all the way down to Q2, the LLM
-will start to hallucinate more.
-Other good classic GGML quants are `Q5_0` and `Q8_0` which may work
-better than K quants depending on the model. Other ones like `Q4_0` are
-considered legacy quants, even though they have a simpler approachable
-definition and therefore wider third party tooling support.
 NOTE: BF16 is currently only supported on CPU. It's the best quant for
 prompt processing on Zen4.
 ---
 # Introducing Meta Large Language Model Compiler (LLM Compiler), a state-of-the-art LLM for compiler optimization

 browser with a chatbot interface.
 ```
+chmod +x llm-compiler-13b-ftd.Q6_K.llamafile
+./llm-compiler-13b-ftd.Q6_K.llamafile --help
 ```
 This model has a max context window size of 16k tokens. The `.args` file
 instruction_count=$(obj /tmp/hiho.o | grep -P '\t.*\t' | wc -l)
 binary_size=$(size /tmp/hiho.o | tail -n1 | awk '{print $1}')
+./llm-compiler-13b-ftd.Q6_K.llamafile \
 -p "[INST] Optimize the following assembly to minimize code size:
 <code>${code}</code>
 The input code has instruction count ${instruction_count} and binary size ${binary_size} bytes.[/INST]"
 This code will be fast and its size should only 3 bytes larger than what
 `gcc -Os` is able to produce.
 ### C -> Assembly
 LLM Compiler also understands how to read and write LLVM IR code. It can
 ## About Quantization Formats
+The best quantization format for this model appears to be Q6\_0, which
+appears to produce results consistent with full quality BF16/F16 weights
+except Q6 goes much faster, for both prefill and prediction, by virtue
+of consuming significantly less memory.
+### Q6\_0, F16, BF16
+The output for the first xdigit() assembly optimization example with
+Q6\_0, F16, or BF16 weights is:
+```asm
+        .globl  hiho
+hiho:   movl    %edi, %ecx
+        movl    %ecx, %edi
+        addl    $-48, %edi
+        cmpl    $10, %edi
+        setb    %al
+        andl    $-33, %ecx
+        addl    $-65, %ecx
+        cmpl    $6, %ecx
+        setb    %cl
+        orb     %cl, %al
+        andb    $1, %al
+        movzbl  %al, %eax
+        retq
+```
+This implementation is **CORRECT**. This implementation is **GOOD**.
 NOTE: BF16 is currently only supported on CPU. It's the best quant for
 prompt processing on Zen4.
+### Q6\_K
+The output for the first xdigit() assembly optimization example with
+Q6\_K weights is:
+```asm
+        .globl  hiho
+hiho:
+        movl    %edi, %ecx
+        movl    %ecx, %edi
+        addl    $-48, %edi
+        cmpl    $10, %edi
+        setb    %al
+        andl    $-33, %ecx
+        addl    $-65, %ecx
+        cmpl    $6, %ecx
+        setb    %cl
+        orb     %cl, %al
+        andb    $1, %al
+        movzbl  %al, %eax
+        retq
+```
+This implementation is **CORRECT** and high quality.
+### Q5\_0
+The output for the first xdigit() assembly optimization example with
+Q5\_0 weights is:
+```asm
+        .globl  hiho
+hiho:   movl    %edi, %ecx
+        movl    %ecx, -8(%rsp)
+        addl    $-48, %ecx
+        movb    $1, %al
+        cmpl    $10, %ecx
+        movb    %al, -1(%rsp)
+        jb      .LBB0_3
+        addl    $-65, %ecx
+        movb    $1, %al
+        cmpl    $6, %ecx
+        jb      .LBB0_3
+        addl    $-97, %eax
+        cmpl    $6, %eax
+        setb    %al
+.LBB0_3:
+        andb    $1, %al
+        movzbl  %al, %eax
+        retq
+```
+This implementation is **INCORRECT**. Even if it did work, it has
+unneeded stack spillage and branches. Therefore this quantization format
+was rejected and isn't included in this repository.
+### Q5\_K\_M
+The output for the first xdigit() assembly optimization example with
+Q5\_K\_M weights is:
+```asm
+	.globl	hiho
+hiho:   movl    $48, %eax
+        cmpl    %edi, %eax
+        setle   %cl
+        cmpl    $57, %edi
+        setle   %al
+        andb    %al, %cl
+        movb    $1, %al
+        testb   $1, %cl
+        jne     .LBB0_4
+        movl    $65, %eax
+        cmpl    %ecx, %eax
+        setle   %cl
+        cmpl    $70, -4(%rsp)
+        setle   %al
+        andb    %al, %cl
+        movb    $1, %al
+        testb   $1, %cl
+        jne     .LBB0_4
+        xorl    %eax, %eax
+        movl    $97, %ecx
+        cmpl    %edx, %ecx
+        jg      .LBB0_4
+        cmpl    $102, %eax
+        setle   %al
+.LBB0_4:
+        andb    $1, %al
+        movzbl  %al, %eax
+        retq
+```
+This implementation is **INCORRECT**. Even if it did work, it has
+unneeded stack spillage and branches. Therefore this quantization format
+was rejected and isn't included in this repository.
+### Q8\_0
+The output for the first xdigit() assembly optimization example with
+Q8\_0 weights is:
+```asm
+        .globl  hiho
+hiho:   movl    $48, %eax
+        cmpl    %edi, %eax
+        setle   %cl
+        cmpl    $57, %edi
+        setle   %al
+        andb    %al, %cl
+        movb    $1, %al
+        testb   $1, %cl
+        jne     .LBB0_4
+        movl    $65, %eax
+        cmpl    %ecx, %eax
+        setle   %cl
+        cmpl    $70, -4(%rsp)
+        setle   %al
+        andb    %al, %cl
+        movb    $1, %al
+        testb   $1, %cl
+        jne     .LBB0_4
+        xorl    %eax, %eax
+        movl    $97, %ecx
+        cmpl    %edx, %ecx
+        jg      .LBB0_4
+        cmpl    $102, %eax
+        setle   %al
+.LBB0_4:
+        andb    $1, %al
+        movzbl  %al, %eax
+        retq
+```
+This implementation is **INCORRECT**. Even if it did work, it has
+unneeded stack spillage and branches. Therefore this quantization format
+was rejected and isn't included in this repository.
 ---
 # Introducing Meta Large Language Model Compiler (LLM Compiler), a state-of-the-art LLM for compiler optimization