Update distilbert_japanese_tokenizer.py
#4
by
liwii
- opened
As discussed in the community, current tokenizer code does not work with transformers>=4.34
, this is because the tokenizer refactoring introduced in that version.
With this change, PreTrainedTokenizer.__init__()
starts to access get_vocab()
, so self.subword_tokenizer_type
needs to be initialized before super().__init__()
of DistilBertJapaneseTokenizer
.
This issue is already fixed in transformers
with 2da8853. This PR basically follows that change.
Confirmed it works with my repository forked from line-corporation/line-distilbert-base-japanese.
Looks good to me! Thank you!
kajyuuen
changed pull request status to
merged