RuntimeError: CUDA error: invalid configuration argument - how to tackle that?
#20
by
guentert
- opened
I am trying to get Mixtral-8x7B-Instruct-v0.1-GPTQ running on an Ubuntu 22.04 container prebuild for CUDA CNN 12 attached to an NVIDIA RTX 6000 Ada Generation GPU.
When using the initialization routine
import torch
from auto_gptq import exllama_set_max_input_length
from transformers import MixtralForCausalLM, AutoTokenizer, pipeline
model_id = "TheBloke/Mixtral-8x7B-Instruct-v0.1-GPTQ"
revision = "main" # main revision ist currently 4 Bit
tokenizer = AutoTokenizer.from_pretrained(model_id, revision=revision, use_fast=True)
model = MixtralForCausalLM.from_pretrained(
model_id,
revision=revision,
device_map="auto",
use_safetensors=True,
trust_remote_code=False,
)
I get the warning
Some weights of the model checkpoint at TheBloke/Mixtral-8x7B-Instruct-v0.1-GPTQ were not used when initializing MixtralForCausalLM
followed by 896 layer names, though nvidia-smi tells me that 20 GB have been uploaded to the card.
When I try to run a simple pipeline on the model:
pipe = pipeline(
"text-generation",
tokenizer=tokenizer,
model=model,
max_new_tokens=512
)
os.environ['CUDA_LAUNCH_BLOCKING']='1' # removal of this line does not change the outcome
result = pipe("[INST] write a poem [INST]")
I get the following error:
RuntimeError: CUDA error: invalid configuration argument
How should I proceed to tackle the problem?
Thanks in advance
Guenter