Tokenizer issue

by tomaarsen HF staff - opened Jan 9

Jan 9

Hello!

This is very cool - I'd love to run it myself too, but I get an issue that word_ids() is not accessible on the non-fast LukeTokenizer . Did you get around this issue somehow?

Tom Aarsen

lambdavi

Owner Jan 9

Hello!

First of all, thanks for your library, works great. Yeah I am going to update the model card, I encountered this problem as well, I used the RobertaTokenizer as alternative. I still have to figure out how to use the LukeTokenizer but I am working on it and I will release a v2 soon.

tokenizer = SpanMarkerTokenizer.from_pretrained("roberta-base", config=model.tokenizer.config)
model.set_tokenizer(tokenizer)

Let me know if this solved the problem.

lambdavi changed discussion status to closed Jan 9

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment