Update tokenizer_config.json
#2
by
saattrupdan
- opened
This removes the extra tokens whose indices exceed the vocab size. Only the pad token is actually used, so we do the standard thing of using the EOS token as the PAD token.
Further, we set the model_max_length
, which was previously not set, as well as changing the padding_side
to 'left' instead of 'right', as the model is auto-regressive.
saattrupdan
changed pull request status to
closed