Inquiry about Generation Speed

#17

by Boyue27 - opened Dec 12, 2023

Dec 12, 2023

I've been experiencing some issues with the generation speed recently and was wondering if anyone else has encountered similar challenges. It seems like the process is slower than usual.

model_path = "mixtral"
model = AutoModelForCausalLM.from_pretrained(
model_path,device_map="auto", max_memory=max_memory_mapping
)
tokenizer =AutoTokenizer.from_pretrained(model_path)
.........................
output_ids = model.generate(input_ids=input_ids.cuda(),
do_sample=True,
temperature=0.4,
top_k=50,
max_new_tokens=300,)

yucheny

Dec 12, 2023

Same here, the generation is very slow.

ArthurZ

Dec 13, 2023

If the model is offloaded to the CPU, then of course it's going to be slow :/ The model did not change, unless you are computing the loss (which was not working on parallel devices). Make sure output_router_logits is set to False in the config

ybelkada

Dec 13, 2023

@Boyue27 your model is most likely offloaded into CPU or disk as stated by Arthur, you need to make sure you load your model in half-precision or 4-bit precision to make sure your model is fit into your GPU device:

For float16:

import torch
from transformers import AutoModelForCausalLM

model_path = "mixtral"
model = AutoModelForCausalLM.from_pretrained(
model_path,device_map="auto", max_memory=max_memory_mapping, torch_dtype=torch.float16
)
tokenizer =AutoTokenizer.from_pretrained(model_path)

4-bit precision (after installing bitsandbytes (pip install bitsandbytes):

import torch
from transformers import AutoModelForCausalLM

model_path = "mixtral"
model = AutoModelForCausalLM.from_pretrained(
model_path,device_map="auto", max_memory=max_memory_mapping, load_in_4bit=True
)
tokenizer =AutoTokenizer.from_pretrained(model_path)

Boyue27

Dec 13, 2023

@ybelkada Thank you for your help. I have tested your code and it fixed the problem.

Boyue27

Dec 13, 2023

@ArthurZ Thank you for your help and the solution is working great for me.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment