'QWenTokenizer' object has no attribute 'IMAGE_ST'
I get this error when trying to run example code snippet:
Exception has occurred: AttributeError
'QWenTokenizer' object has no attribute 'IMAGE_ST'
File "/home/neman/.cache/huggingface/modules/transformers_modules/MMInstruction_Silkie/tokenization_qwen.py", line 223, in _add_tokens
if surface_form not in SPECIAL_TOKENS + self.IMAGE_ST:
File "/home/neman/.cache/huggingface/modules/transformers_modules/MMInstruction_Silkie/tokenization_qwen.py", line 120, in init
super().init(**kwargs)
File "/home/neman/PROGRAMMING/PYTHON/QwenVL/SilkieVL_test1.py", line 6, in
tokenizer = AutoTokenizer.from_pretrained(
AttributeError: 'QWenTokenizer' object has no attribute 'IMAGE_ST'
Hello @Neman thank you for your interest in Silkie!
Regarding the issue you encountered, could you please provide more information about your environment so we can reproduce the issue? It is strange since QwenTRokenizer
does have the attribute IMAGE_ST
(see here). It might be related to dependencies. You can find the installation instructions for our environment in our Github repository.
I took a quick look at this, since I ran into the problem myself. The error is because the super().__init__(**kwargs)
of QWenTokenizer
calls QWenTokenizer._add_tokens()
, which requires IMAGE_ST
, but the super().__init__()
call happens before IMAGE_ST
is defined in the initializer, so the _add_tokens()
call crashes with an error. Commenting out lines 223 and 224 seems to fix it as a quick hack, but hopefully someone with a better understanding can fix it more properly.
I took a quick look at this, since I ran into the problem myself. The error is because the
super().__init__(**kwargs)
ofQWenTokenizer
callsQWenTokenizer._add_tokens()
, which requiresIMAGE_ST
, but thesuper().__init__()
call happens beforeIMAGE_ST
is defined in the initializer, so the_add_tokens()
call crashes with an error. Commenting out lines 223 and 224 seems to fix it as a quick hack, but hopefully someone with a better understanding can fix it more properly.
Hi
@bob80333
I think you are right! After a deeper investigation, I found the issue comes from the refactor of PreTrainedTokenizer
in Transformers release 4.34. Since v4.34, self._add_tokens
is called in the initializer of PreTrainedTokenizer
. We followed Qwen-VL and used transformers < 4.34
, which prevented us from encountering this issue. I suggest using the same version to reproduce our experiments, or customizing tokenization_qwen.py
as suggested by
@bob80333
.
Thank you both. Bob's suggestion solved it.
I did few tests with random photos and compared with Qwen-VL-Chat-Int4. It is little bit better (need more testing). I should have compared with not quantized Qwen, but just to report.