Error in loading tokenizer

#11
by yling1105 - opened

When I run the following command,

model_id = "mistralai/Mistral-7B-v0.3"
tokenizer = AutoTokenizer.from_pretrained(model_id)

I keep getting the error:

Exception: data did not match any variant of untagged enum PyPreTokenizerTypeWrapper at line 6952 column 3
Any idea?

from transformers import LlamaForCausalLM, LlamaTokenizer
tokenizer = LlamaTokenizer.from_pretrained(model_id, legacy = True)

but I don't really know if it works...

Can you confirm you are running the latest version of transformers/tokenizers?

Loading with use_fast=False works for me, it means use slow tokenizer.

Sign up or log in to comment