Update README.md
Browse files
README.md
CHANGED
@@ -57,7 +57,8 @@ Your choice of quantization format depends on three things:
|
|
57 |
3. llamafiles bigger than 4.30 GB are hard to run on Windows (see [gotchas](https://github.com/mozilla-ocho/llamafile/?tab=readme-ov-file#gotchas))
|
58 |
|
59 |
Good quants for writing (eval speed) are Q5\_K\_M, and Q4\_0. Text
|
60 |
-
generation is bounded by memory speed, so smaller quants help
|
|
|
61 |
|
62 |
Good quants for reading (prompt eval speed) are BF16, F16, Q4\_0, and
|
63 |
Q8\_0 (ordered from fastest to slowest). Prompt evaluation is bounded by
|
|
|
57 |
3. llamafiles bigger than 4.30 GB are hard to run on Windows (see [gotchas](https://github.com/mozilla-ocho/llamafile/?tab=readme-ov-file#gotchas))
|
58 |
|
59 |
Good quants for writing (eval speed) are Q5\_K\_M, and Q4\_0. Text
|
60 |
+
generation is bounded by memory speed, so smaller quants help, but they
|
61 |
+
also cause the LLM to hallucinate more.
|
62 |
|
63 |
Good quants for reading (prompt eval speed) are BF16, F16, Q4\_0, and
|
64 |
Q8\_0 (ordered from fastest to slowest). Prompt evaluation is bounded by
|