ValueError: Tokenizer class CohereTokenizer does not exist or is not currently imported.
Running this example script gives an error
# pip install 'git+https://github.com/huggingface/transformers.git' bitsandbytes accelerate
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
bnb_config = BitsAndBytesConfig(load_in_8bit=True)
model_id = "CohereForAI/c4ai-command-r-plus"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, quantization_config=bnb_config)
# Format message with the command-r-plus chat template
messages = [{"role": "user", "content": "Hello, how are you?"}]
input_ids = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, return_tensors="pt")
## <BOS_TOKEN><|START_OF_TURN_TOKEN|><|USER_TOKEN|>Hello, how are you?<|END_OF_TURN_TOKEN|><|START_OF_TURN_TOKEN|><|CHATBOT_TOKEN|>
gen_tokens = model.generate(
input_ids,
max_new_tokens=100,
do_sample=True,
temperature=0.3,
)
gen_text = tokenizer.decode(gen_tokens[0])
print(gen_text)
try pip install 'git+https://github.com/huggingface/transformers.git' bitsandbytes accelerate
to install latest transformers, it use a new tokenizer called CohereTokenizer
which added in new version of transformers
try
pip install 'git+https://github.com/huggingface/transformers.git' bitsandbytes accelerate
to install latest transformers, it use a new tokenizer calledCohereTokenizer
which added in new version of transformers
Which transformers version exactly? I followed this instruction and end up with transformers=4.40.0.dev0, but I still can't import CohereTokenizer.
try
pip install 'git+https://github.com/huggingface/transformers.git' bitsandbytes accelerate
to install latest transformers, it use a new tokenizer calledCohereTokenizer
which added in new version of transformersWhich transformers version exactly? I followed this instruction and end up with transformers=4.40.0.dev0, but I still can't import CohereTokenizer.
There is no CohereTokenizer but there is CohereTokenizerFast.
Try modify the tokenizer_class from "CohereTokenizer" to "CohereTokenizerFast" in tokenizer_config.json
hi
@xiangrong
, can you try with AutoTokenizer
, it should work because it is mapped to the correct Tokenizer class.
@ahmetustun Thankyou, it works
@ahmetustun it's very weird that AutoTokenizer only works when use_fast is set to True which is the case with AutoTokenizer by default. If you manually set it to False it will throw this error:
In [6]: import transformers
In [7]: tokenizer = AutoTokenizer.from_pretrained('.models/ArabicLLM', use_fast=False)
ValueError Traceback (most recent call last)
Cell In[7], line 1
----> 1 tokenizer = AutoTokenizer.from_pretrained('.models/ArabicLLM', use_fast=False)
File ~/anaconda3/envs/llava/lib/python3.10/site-packages/transformers/models/auto/tokenization_auto.py:877, in AutoTokenizer.from_pretrained(cls, pretrained_model_name_or_path, *inputs, **kwargs)
875 tokenizer_class = tokenizer_class_from_name(tokenizer_class_candidate)
876 if tokenizer_class is None:
--> 877 raise ValueError(
878 f"Tokenizer class {tokenizer_class_candidate} does not exist or is not currently imported."
879 )
880 return tokenizer_class.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs)
882 # Otherwise we have to be creative.
883 # if model is an encoder decoder, the encoder tokenizer class is used by default
ValueError: Tokenizer class CohereTokenizer does not exist or is not currently imported.