jartine commited on
Commit
65b0802
1 Parent(s): dfd4efd

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -1
README.md CHANGED
@@ -57,7 +57,8 @@ Your choice of quantization format depends on three things:
57
  3. llamafiles bigger than 4.30 GB are hard to run on Windows (see [gotchas](https://github.com/mozilla-ocho/llamafile/?tab=readme-ov-file#gotchas))
58
 
59
  Good quants for writing (eval speed) are Q5\_K\_M, and Q4\_0. Text
60
- generation is bounded by memory speed, so smaller quants help.
 
61
 
62
  Good quants for reading (prompt eval speed) are BF16, F16, Q4\_0, and
63
  Q8\_0 (ordered from fastest to slowest). Prompt evaluation is bounded by
 
57
  3. llamafiles bigger than 4.30 GB are hard to run on Windows (see [gotchas](https://github.com/mozilla-ocho/llamafile/?tab=readme-ov-file#gotchas))
58
 
59
  Good quants for writing (eval speed) are Q5\_K\_M, and Q4\_0. Text
60
+ generation is bounded by memory speed, so smaller quants help, but they
61
+ also cause the LLM to hallucinate more.
62
 
63
  Good quants for reading (prompt eval speed) are BF16, F16, Q4\_0, and
64
  Q8\_0 (ordered from fastest to slowest). Prompt evaluation is bounded by