Update tokenizer_config.json

by saattrupdan - opened 7 days ago

base: refs/heads/main

←

from: refs/pr/2

Discussion Files changed

-46

saattrupdan

7 days ago

This removes the extra tokens whose indices exceed the vocab size. Only the pad token is actually used, so we do the standard thing of using the EOS token as the PAD token.

Further, we set the model_max_length, which was previously not set, as well as changing the padding_side to 'left' instead of 'right', as the model is auto-regressive.

Update tokenizer_config.json262a6bd2

saattrupdan changed pull request status to closed 7 days ago

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment