Quantized version temporarily unavailable
We saw some performance issues with the quantized version and have taken it down temporarily while we investigate.
We saw some performance issues with the quantized version and have taken it down temporarily while we investigate.
Any ETA on this? :)
We ended up needing to submit a PR for llama.cpp to support our tokenizer. We submitted the PR today so hopefully it can be fixed soon:
https://github.com/ggerganov/llama.cpp/pull/7713/files
Once the PR is merged we should be able to upload a new version.
Sweet, looking forward to that!
ggerganov approved it
Progress on this?
Should be coming back ~today!
We have a little more testing to do, but it looks good for tomorrow.
It's uploaded, please let us know if you have any trouble. Make sure you're using a current version of llama.cpp though!