shaowenchen/vicuna-33b-v1.3-gguf

Provided files

Usage:

docker run --rm -it -p 8000:8000 -v /path/to/models:/models -e MODEL=/models/gguf-model-name.gguf hubimage/llama-cpp-python:latest

Name	Quant method	Compressed Size
`shaowenchen/vicuna-33b-v1.3-gguf:Q2_K`	Q2_K	12.78 GB
`shaowenchen/vicuna-33b-v1.3-gguf:Q3_K`	Q3_K	14.81 GB
`shaowenchen/vicuna-33b-v1.3-gguf:Q4_K`	Q4_K	18.24 GB
`shaowenchen/vicuna-33b-v1.3-gguf:Q5_K`	Q5_K	21.72 GB
`shaowenchen/vicuna-33b-v1.3-gguf:Q6_K`	Q6_K	25.05 GB
`shaowenchen/vicuna-33b-v1.3-gguf:Q8_0`	Q8_0	31.34 GB
`shaowenchen/vicuna-33b-v1.3-gguf:full`	full	56.07 GB

Usage:

docker run --rm -p 8000:8000 shaowenchen/vicuna-33b-v1.3-gguf:Q2_K

and you can view http://localhost:8000/docs to see the swagger UI.