Awesome model !!!
Excellent! Thanks for reporting. That's a good test!
May i know memory requirement to run this model on GPU.
i assume min 17GB video memory.
currently i'm on 32GB Ram with GTX 1080ti (11GB Vid Memory)
when i turn to load this model on text-generation-webui, it fills ram (bit of mem swapping) then it crashes.
may i know how much RAM is required to run this and is it possible to run only on CPU mode (selecting cpu under model settings in text-generatino-webui didn't help)
following model works fine on GPU
TheBloke_stable-vicuna-13B-GPTQ
Getting about 8tokens/sec
tried same question as in this thread
Thanks
May i know memory requirement to run this model on GPU.
i assume min 17GB video memory.
currently i'm on 32GB Ram with GTX 1080ti (11GB Vid Memory)
when i turn to load this model on text-generation-webui, it fills ram (bit of mem swapping) then it crashes.
may i know how much RAM is required to run this and is it possible to run only on CPU mode (selecting cpu under model settings in text-generatino-webui didn't help)following model works fine on GPU
TheBloke_stable-vicuna-13B-GPTQ
Getting about 8tokens/sec
tried same question as in this threadThanks
You need to choose ggml version to run on cpu, GPTQ is only for GPU. This model requires at least 18G to load, and the usage of vram will increase to 21G after several chats, so i suggest using GPU with at least 24G VRAM.
You need to confine the usage of VRAM and leave some VRAM for chat to get rid of "CUDA: OUT OF MEMEROY". In oob-webui that's --gpu-memory 8 (8 is an example) This will decrease the generating speed, but decrease the demand of VRAM.