Error using example code in model card
Good morning,
I am writing to ask how to overcome the issue popping up while trying to run the pipeline described in the model card.
from transformers import AutoTokenizer
import transformers
import torch
model = "OpenAssistant/falcon-7b-sft-mix-2000"
tokenizer = AutoTokenizer.from_pretrained(model)
pipeline = transformers.pipeline(
"text-generation",
model=model,
tokenizer=tokenizer,
torch_dtype=torch.bfloat16,
trust_remote_code=True,
device_map="auto",
)
input_text="<|prompter|>What is a meme, and what's the history behind this word?<|endoftext|><|assistant|>"
sequences = pipeline(
input_text,
max_length=500,
do_sample=True,
return_full_text=False,
top_k=10,
num_return_sequences=1,
eos_token_id=tokenizer.eos_token_id,
)
for seq in sequences:
print(f"Result: {seq['generated_text']}")
Which triggers the following error
How did you manage to solve this issue to run the pipeline successfully?
Is everything updated? Especially check that transformers
is updated.
Yes, everything was installed from Scrath for a Hackathon project using Python 3.11
@Metal3d This is strange. It seems to work fine for me with transformers 4.31.0. Are you sure you copied the sample code right?
The first guy was also using a flask server, so his code couldn't be verified.
If you really want to use the model, try loading in without the pipeline:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "OpenAssistant/falcon-7b-sft-mix-2000"
tokenizer = AutoTokenizer.from_pretrained(model_name, padding_size='left')
tokenizer.pad_token = tokenizer.eos_token
model = AutoModelForCausalLM.from_pretrained(model_name,
torch_dtype=torch.bfloat16,
device_map="auto",
trust_remote_code=True
).half()
tokens = tokenizer.encode("Hello, my dog is cute", return_tensors="pt", padding=True, truncation=True).to(model.device)
gen = model.generate(tokens, do_sample=True, max_length=100, top_p=0.95, top_k=50, num_return_sequences=1, no_repeat_ngram_size=2, early_stopping=True)
print(tokenizer.decode(gen[0], skip_special_tokens=True))
@ThatOneShortGuy can I run this model on CPU ?
Yes, you theoretically can, but it is extremely slow.
Practically speaking, no.
It's been 16 minutes and not a single token has been generated on my 10600k. It may be worth noting that it is single threaded, but 🤷♂️
@Metal3d This is strange. It seems to work fine for me with transformers 4.31.0. Are you sure you copied the sample code right?
The first guy was also using a flask server, so his code couldn't be verified.
If you really want to use the model, try loading in without the pipeline:
import torch from transformers import AutoModelForCausalLM, AutoTokenizer model_name = "OpenAssistant/falcon-7b-sft-mix-2000" tokenizer = AutoTokenizer.from_pretrained(model_name, padding_size='left') tokenizer.pad_token = tokenizer.eos_token model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.bfloat16, device_map="auto", trust_remote_code=True ).half() tokens = tokenizer.encode("Hello, my dog is cute", return_tensors="pt", padding=True, truncation=True).to(model.device) gen = model.generate(tokens, do_sample=True, max_length=100, top_p=0.95, top_k=50, num_return_sequences=1, no_repeat_ngram_size=2, early_stopping=True) print(tokenizer.decode(gen[0], skip_special_tokens=True))
Yes, everything following the instructions give. I will give a try to 4.31.0 but as one model works and another not. I believe the problem must be in the instructions or. The model itself