Error in loading tokenizer
#11
by
yling1105
- opened
When I run the following command,
model_id = "mistralai/Mistral-7B-v0.3"
tokenizer = AutoTokenizer.from_pretrained(model_id)
I keep getting the error:
Exception: data did not match any variant of untagged enum PyPreTokenizerTypeWrapper at line 6952 column 3
Any idea?
from transformers import LlamaForCausalLM, LlamaTokenizer
tokenizer = LlamaTokenizer.from_pretrained(model_id, legacy = True)
but I don't really know if it works...
Can you confirm you are running the latest version of transformers/tokenizers?
Loading with use_fast=False
works for me, it means use slow tokenizer.