moss-moon-003-base 模型的 tokenizer 中,`eos token` 为 `<|endoftext|>`,在训练SFT模型时需要将该 token 指定为 `` token. ## SFT 阶段 - ``: end of human - ``: end of thoughts - ``: end of commands - ``: end of moss ## 注意 moss的 ```py def convert_tokens_to_string(self, tokens): """Converts a sequence of tokens (string) in a single string.""" text = "".join(tokens) text = bytearray([self.byte_decoder[c] for c in text]).decode("utf-8", errors=self.errors) return text ``` ## troubleshooting