Error: CUDA error: CUBLAS_STATUS_NOT_SUPPORTED
Hello,
We are running the code:
import torch
from transformers import pipeline, AutoModelForCausalLM
print('got here')
generate_text = pipeline(model="databricks/dolly-v2-12b", torch_dtype=torch.bfloat16, trust_remote_code=True, device_map="auto")
print('got here2')
generate_text("Who is Nic Chaillan?")
print('got here3')
On an Azure NV48s v3 (24 GPU vcpus, 224 GiB memory)
We get the error:
got here
got here2
Traceback (most recent call last):
File "/datadrive/dolly-v2-12b/test.py", line 8, in
generate_text("Who is Nic Chaillan?")
File "/usr/local/lib/python3.9/dist-packages/transformers/pipelines/base.py", line 1074, in call
return self.run_single(inputs, preprocess_params, forward_params, postprocess_params)
File "/usr/local/lib/python3.9/dist-packages/transformers/pipelines/base.py", line 1081, in run_single
model_outputs = self.forward(model_inputs, **forward_params)
File "/usr/local/lib/python3.9/dist-packages/transformers/pipelines/base.py", line 990, in forward
model_outputs = self._forward(model_inputs, **forward_params)
File "/home/nicos/.cache/huggingface/modules/transformers_modules/databricks/dolly-v2-12b/f8adc425f3ce69a26d57c89c1b69429a74e2ec0e/instruct_pipeline.py", line 103, in _forward
generated_sequence = self.model.generate(
File "/usr/local/lib/python3.9/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/usr/local/lib/python3.9/dist-packages/transformers/generation/utils.py", line 1571, in generate
return self.sample(
File "/usr/local/lib/python3.9/dist-packages/transformers/generation/utils.py", line 2534, in sample
outputs = self(
File "/usr/local/lib/python3.9/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.9/dist-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "/usr/local/lib/python3.9/dist-packages/transformers/models/gpt_neox/modeling_gpt_neox.py", line 654, in forward
outputs = self.gpt_neox(
File "/usr/local/lib/python3.9/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.9/dist-packages/transformers/models/gpt_neox/modeling_gpt_neox.py", line 546, in forward
outputs = layer(
File "/usr/local/lib/python3.9/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.9/dist-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "/usr/local/lib/python3.9/dist-packages/transformers/models/gpt_neox/modeling_gpt_neox.py", line 319, in forward
attention_layer_outputs = self.attention(
File "/usr/local/lib/python3.9/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.9/dist-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "/usr/local/lib/python3.9/dist-packages/transformers/models/gpt_neox/modeling_gpt_neox.py", line 153, in forward
attn_output, attn_weights = self._attn(query, key, value, attention_mask, head_mask)
File "/usr/local/lib/python3.9/dist-packages/transformers/models/gpt_neox/modeling_gpt_neox.py", line 233, in _attn
attn_output = torch.matmul(attn_weights, value)
RuntimeError: CUDA error: CUBLAS_STATUS_NOT_SUPPORTED when calling cublasGemmStridedBatchedExFix(handle, opa, opb, (int)m, (int)n, (int)k, (void*)&falpha, a, CUDA_R_16BF, (int)lda, stridea, b, CUDA_R_16BF, (int)ldb, strideb, (void*)&fbeta, c, CUDA_R_16BF, (int)ldc, stridec, (int)num_batches, CUDA_R_32F, CUBLAS_GEMM_DEFAULT_TENSOR_OP)
Any clue what to do to fix this?
This means you don't have all the NVIDIA libraries installed. Here it's complaining about CUBLAS. You can see what you have to add to a standard runtime in Databricks for example, here: https://github.com/databrickslabs/dolly/blob/master/train_dolly.py#L27 That might be a clue.
I have the same error. Any luck on solving this?
I think this can also arise as an "out of memory" error. Please, it's more helpful if people say how they are running this, and whether you've ruled out what is in previous comments!
My Code:
from transformers import pipeline
generate_text = pipeline(model="databricks/dolly-v2-3b", torch_dtype=torch.bfloat16, trust_remote_code=True, device_map='auto')
edu_prompt = "Extract the universities from the following text: My name is Hamza and I have a bachelor's degree from the university of toronto and a master's degree from the university of waterloo."
edu = generate_text(edu_prompt)
12 GB GPU
torch 1.13.1 with cuda 11.7
I don't think a 6GB model should give me an "out of memory" error.
Yeah that's not it, but do you have cublas installed? See above
Hi.
I have the same problem on an Ubuntu 20.04 server with plenty of memory. Have you had any success fixing this error?
/Tomas
Do you have the right cublas installed? What lib version vs what CUDA?
Do you have the right cublas installed? What lib version vs what CUDA?
Which version should I have? I have cuda 11.7.
This is all covered in the provided training scripts.
https://github.com/databrickslabs/dolly/blob/master/train_dolly.py#L53
Sorry, since I am new user I could not reply anymore last week. This problem is not solved. I have created a Dockerfile with the correct cublas version, but it does not work as follows (it ends with the same error):
------ Dockerfile ------
FROM pytorch/pytorch:1.11.0-cuda11.3-cudnn8-devel
WORKDIR /app/dolly
RUN apt-get upgrade
ADD https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/libcusparse-dev-11-3_11.5.0.58-1_amd64.deb /tmp
RUN dpkg -i /tmp/libcusparse-dev-11-3_11.5.0.58-1_amd64.deb
ADD https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/libcublas-11-3_11.5.1.109-1_amd64.deb /tmp
ADD https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/libcublas-dev-11-3_11.5.1.109-1_amd64.deb /tmp
RUN dpkg -i /tmp/libcublas-11-3_11.5.1.109-1_amd64.deb
RUN dpkg -i /tmp/libcublas-dev-11-3_11.5.1.109-1_amd64.deb
ADD https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/libcusolver-dev-11-3_11.1.2.109-1_amd64.deb /tmp
RUN dpkg -i /tmp/libcusolver-dev-11-3_11.1.2.109-1_amd64.deb
ADD https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/libcurand-dev-11-3_10.2.4.109-1_amd64.deb /tmp
RUN dpkg -i /tmp/libcurand-dev-11-3_10.2.4.109-1_amd64.deb
RUN pip install accelerate>=0.12.0 transformers[torch]==4.25.1
RUN pip install ipython
ADD https://huggingface.co/databricks/dolly-v2-3b/raw/main/instruct_pipeline.py .
COPY ./init_dolly.py .
CMD DISABLE_ADDMM_CUDA_LT=1 ipython -i init_dolly.py
------ init_dolly.py ------
import torch
from instruct_pipeline import InstructionTextGenerationPipeline
from transformers import AutoModelForCausalLM, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("databricks/dolly-v2-12b", padding_side="left")
model = AutoModelForCausalLM.from_pretrained("databricks/dolly-v2-12b", device_map="auto", torch_dtype=torch.bfloat16)
generate_text = InstructionTextGenerationPipeline(model=model, tokenizer=tokenizer)
What hardware? this would only run on an A100 as you've written it.
What hardware? this would only run on an A100 as you've written it.
OK, then that is why it doesn't work. How do I change the used hardware?
I suspect OOM or something, but what error are you getting? maybe this should be a separate thread with more info.
You control the hardware by, well, choosing where you run it?