Please Consider Adding A Chat Template To The Model Tokenizer

by The0 - opened Nov 15, 2023

Discussion

The0

Nov 15, 2023

•

edited Nov 17, 2023

See here: https://huggingface.co/docs/transformers/v4.35.1/en/chat_templating#introduction

As it's currently setup if you do something like below it will use the wrong chat template

from transformers import pipeline

pipe = pipeline("text-generation", model="NousResearch/Nous-Capybara-34B", trust_remote_code=True, torch_dtype=torch.bfloat16, device_map="auto")
prompt = pipe.tokenizer.apply_chat_template(conversation_history, tokenize=False, add_generation_prompt=True)
outputs = pipe(prompt)

LDJnr

Nov 16, 2023

Thank you! Will consider.

Gnurro2

Nov 28, 2023

•

edited Nov 28, 2023

What is the proper chat format anyways? The info on the model card is not helpful at all - the actual expected format as string for a few messages would be. It also seems off, as </s> is not part of the special tokens of the tokenizer...

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment