Error using example code in model card

by Eduardo-AC - opened

Good morning,

I am writing to ask how to overcome the issue popping up while trying to run the pipeline described in the model card.

from transformers import AutoTokenizer
import transformers
import torch

model = "OpenAssistant/falcon-7b-sft-mix-2000"

tokenizer = AutoTokenizer.from_pretrained(model)
pipeline = transformers.pipeline(

input_text="<|prompter|>What is a meme, and what's the history behind this word?<|endoftext|><|assistant|>"

sequences = pipeline(
for seq in sequences:
    print(f"Result: {seq['generated_text']}")

Which triggers the following error

Screenshot 2023-07-20 at 10.47.22.png

How did you manage to solve this issue to run the pipeline successfully?

Is everything updated? Especially check that transformers is updated.

Yes, everything was installed from Scrath for a Hackathon project using Python 3.11

Exactly the same error for me

@Metal3d This is strange. It seems to work fine for me with transformers 4.31.0. Are you sure you copied the sample code right?

The first guy was also using a flask server, so his code couldn't be verified.

If you really want to use the model, try loading in without the pipeline:

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "OpenAssistant/falcon-7b-sft-mix-2000"

tokenizer = AutoTokenizer.from_pretrained(model_name, padding_size='left')
tokenizer.pad_token = tokenizer.eos_token

model = AutoModelForCausalLM.from_pretrained(model_name,

tokens = tokenizer.encode("Hello, my dog is cute", return_tensors="pt", padding=True, truncation=True).to(model.device)
gen = model.generate(tokens, do_sample=True, max_length=100, top_p=0.95, top_k=50, num_return_sequences=1, no_repeat_ngram_size=2, early_stopping=True)
print(tokenizer.decode(gen[0], skip_special_tokens=True))

@ThatOneShortGuy can I run this model on CPU ?

Yes, you theoretically can, but it is extremely slow.

Practically speaking, no.

It's been 16 minutes and not a single token has been generated on my 10600k. It may be worth noting that it is single threaded, but 🤷‍♂️

@Metal3d This is strange. It seems to work fine for me with transformers 4.31.0. Are you sure you copied the sample code right?

The first guy was also using a flask server, so his code couldn't be verified.

If you really want to use the model, try loading in without the pipeline:

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "OpenAssistant/falcon-7b-sft-mix-2000"

tokenizer = AutoTokenizer.from_pretrained(model_name, padding_size='left')
tokenizer.pad_token = tokenizer.eos_token

model = AutoModelForCausalLM.from_pretrained(model_name,

tokens = tokenizer.encode("Hello, my dog is cute", return_tensors="pt", padding=True, truncation=True).to(model.device)
gen = model.generate(tokens, do_sample=True, max_length=100, top_p=0.95, top_k=50, num_return_sequences=1, no_repeat_ngram_size=2, early_stopping=True)
print(tokenizer.decode(gen[0], skip_special_tokens=True))

Yes, everything following the instructions give. I will give a try to 4.31.0 but as one model works and another not. I believe the problem must be in the instructions or. The model itself

Sign up or log in to comment