philschmid HF staff commited on
Commit
f6230c8
1 Parent(s): 45011b4

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -1
README.md CHANGED
@@ -9,6 +9,9 @@ This repository includes a fast tokenizer for [google/gemma-7b](https://huggingf
9
 
10
  No new tokens were added during that process to ensure that the original model's embedding doesn't need to be modified.
11
 
 
 
 
12
  ```python
13
  from transformers import AutoTokenizer
14
 
@@ -21,7 +24,7 @@ messages = [
21
  ]
22
 
23
  chatml = tokenizer.apply_chat_template(messages, add_generation_prompt=False, tokenize=False)
24
-
25
  # <bos><|im_start|>system
26
  # You are Gemma.<|im_end|>
27
  # <|im_start|>user
 
9
 
10
  No new tokens were added during that process to ensure that the original model's embedding doesn't need to be modified.
11
 
12
+
13
+ _Note: It is important to note that this tokenizer is not 100% ChatML compliant, since it seems [google/gemma-7b](https://huggingface.co/google/gemma-7b), always requires the original `<bos>` token to be part of the input. This means the chat template is `<bos>` + `chatml` + `<eos>`_
14
+
15
  ```python
16
  from transformers import AutoTokenizer
17
 
 
24
  ]
25
 
26
  chatml = tokenizer.apply_chat_template(messages, add_generation_prompt=False, tokenize=False)
27
+ print(chatml)
28
  # <bos><|im_start|>system
29
  # You are Gemma.<|im_end|>
30
  # <|im_start|>user