Tokenizer broken after last commit

#53
by AD-530 - opened

Hi,

the latest commit 9a99991 (Rename tokenizer.model to tokenizer.model.v3) from user patrickvonplaten breaks the model when loading with HF transformers.
Stacktrace:

File "/home/user/mixtral/venv/lib/python3.10/site-packages/vllm/transformers_utils/tokenizer_group/init.py", line 29, in init_tokenizer_from_configs
return get_tokenizer_group(parallel_config.tokenizer_pool_config,
File "/home/user/mixtral/venv/lib/python3.10/site-packages/vllm/transformers_utils/tokenizer_group/init.py", line 50, in get_tokenizer_group
return tokenizer_cls.from_config(tokenizer_pool_config, **init_kwargs)
File "/home/user/mixtral/venv/lib/python3.10/site-packages/vllm/transformers_utils/tokenizer_group/tokenizer_group.py", line 29, in from_config
return cls(**init_kwargs)
File "/home/user/mixtral/venv/lib/python3.10/site-packages/vllm/transformers_utils/tokenizer_group/tokenizer_group.py", line 22, in init
self.tokenizer = get_tokenizer(self.tokenizer_id, **tokenizer_config)
File "/home/user/mixtral/venv/lib/python3.10/site-packages/vllm/transformers_utils/tokenizer.py", line 95, in get_tokenizer
tokenizer = AutoTokenizer.from_pretrained(
File "/home/user/mixtral/venv/lib/python3.10/site-packages/transformers/models/auto/tokenization_auto.py", line 897, in from_pretrained
return tokenizer_class.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs)
File "/home/user/mixtral/venv/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 2271, in from_pretrained
return cls._from_pretrained(
File "/home/user/mixtral/venv/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 2505, in _from_pretrained
tokenizer = cls(*init_inputs, **init_kwargs)
File "/home/user/mixtral/venv/lib/python3.10/site-packages/transformers/models/llama/tokenization_llama_fast.py", line 157, in init
super().init(
File "/home/user/mixtral/venv/lib/python3.10/site-packages/transformers/tokenization_utils_fast.py", line 131, in init
slow_tokenizer = self.slow_tokenizer_class(*args, **kwargs)
File "/home/user/mixtral/venv/lib/python3.10/site-packages/transformers/models/llama/tokenization_llama.py", line 171, in init
self.sp_model = self.get_spm_processor(kwargs.pop("from_slow", False))
File "/home/user/mixtral/venv/lib/python3.10/site-packages/transformers/models/llama/tokenization_llama.py", line 201, in get_spm_processor
with open(self.vocab_file, "rb") as f:
TypeError: expected str, bytes or os.PathLike object, not NoneType

Same here, Could you please fix this as quick as possible as we are on paper rebuttal.

Mistral AI_ org

Sorry about it! Just reverted the renaming

patrickvonplaten changed discussion status to closed

Sign up or log in to comment