Pad_token_id of MPT-7B

#49

by Trung-Dung - opened Jun 5, 2023

Jun 5, 2023

I want to use MPT-7B with text-generation pipeline. To do batch processing, I need to set the pad_token_id. However, the tokenizer doesn't have pad, eos and bos tokens. What value should I set in this case?

abhi-mosaic

Jun 5, 2023

Hi @Trung-Dung , we use the GPT NeoX tokenizer which should have an EOS token id. I think you can safely reuse the EOS token id as the PAD token id at inference time.

sam-mosaic changed discussion status to closed Jun 14, 2023

ylhe96

Aug 4, 2023

As a follow-up to this discussion. When using the EOS as the PAD token, is there any recommendation for the padding side?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment