tokenizer.bos_token_id is None
#20
by
yourui
- opened
看起来tokenizer.bos_token_id
is None
File "ptuning/main.py", line 219, in preprocess_function_train
context_length = input_ids.index(tokenizer.bos_token_id)
ValueError: None is not in list
yourui
changed discussion title from
`tokenizer.bos_token_id` is `None`
to tokenizer.bos_token_id is None
Python 3.9.16 (main, Mar 8 2023, 04:29:44)
[Clang 14.0.6 ] :: Anaconda, Inc. on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from transformers import AutoTokenizer
>>> tokenizer = AutoTokenizer.from_pretrained("THUDM/chatglm2-6b", trust_remote_code=True)
Downloading tokenizer.model: 100%|████████████████████████████████████████████| 1.02M/1.02M [00:02<00:00, 355kB/s]
>>> print(tokenizer.bos_token_id)
None
I have the same issue
遇到了同样的问题
将 bos_token_id 改为eos_token_id可解决该问题
将 bos_token_id 改为eos_token_id可解决该问题
这样改有问题,你没发现吗,你这样改之后所有的id全部变成-100了,这样训练的模型有问题,但是我现在不懂怎么改
看起来
tokenizer.bos_token_id
isNone
File "ptuning/main.py", line 219, in preprocess_function_train context_length = input_ids.index(tokenizer.bos_token_id) ValueError: None is not in list
这个问题你解决了吗,我也碰到了这样的问题
真实原因是人家有chat-glm2对应的github代码微调库,你用的chat-glm1的
真实原因是人家有chat-glm2对应的github代码微调库,你用的chat-glm1的
懂了,那我直接chat-glm2对应的微调代码库是吧,有连接吗,谢谢
找到了,谢谢
yourui
changed discussion status to
closed