Set PYTORCH_CUDA_ALLOC_CONF=backend:cudaMallocAsync

by John198 - opened Oct 24

Oct 24

•

Sorry for simple question but how did you change the environment variable on tabbyapi? I edited the end of my start.sh to terminate with

export PYTORCH_CUDA_ALLOC_CONF=backend:cudaMallocAsync
python start.py "$@"

and it seems to break autosplit functionality on config.yaml.

DBMe

Owner Oct 24

Hey, no worries.

I ran into that same issue a while back. It’s actually a bug that popped up in exllamav2 0.2.3 . Here’s the bug tracker for it: https://github.com/turboderp/exllamav2/issues/647

It’s already been fixed in the dev branch, but it’ll probably be in the next release. I ended up rolling back to a TabbyAPI commit (56ce82e) that still used exllamav2 0.2.2, and that worked fine for me.

John198

Oct 24

It works as intended now - appreciate it!

Would it be possible to make a request? I have heard very good things about a recently released Llama 3.1 model (https://huggingface.co/MikeRoz/ArliAI_Llama-3.1-70B-ArliAI-RPMax-v1.2-4.5bpw-h6-exl2) however I'm having a little bit of trouble trying to optimize the exact size for 48 VRAM. The 4.5 BPW can be run at 65K context but 32K is my intended usecase and is slightly too large at 5 bpw. Is that a setup issue? What is your experience with Llama 3.1 quants for 48 GB @32K?

DBMe

Owner Oct 24

Glad to hear it's working for you now!

I’m not taking requests at the moment, but I can share my experience. LLaMA 3.1 70B models can just about handle 5bpw with a 32K context on 48GB VRAM, but that’s assuming the VRAM is completely empty(which mine is since I plug my monitor to the APU) and you set PYTORCH_CUDA_ALLOC_CONF=backend:cudaMallocAsync (without this TabbyAPI gives me OOM).

I downloaded the 5bpw version of the model you mentioned (https://huggingface.co/MikeRoz/ArliAI_Llama-3.1-70B-ArliAI-RPMax-v1.2-5.0bpw-h6-exl2) and it loaded fine for me at 32K.

I’ve taken some screenshots to show how it fits:

TLDR: It's most likely a setup or config issue.

John198

Oct 25

Thanks for troubleshooting and your continued work in this rather particular niche. Look forward to future releases!

John198 changed discussion status to closed Oct 25

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment