yuyijiong/Qwen-14b-chat-yarn-32k · 用LLaMA-Factory项目做sft和evaluate时报错

charry2000

Dec 26, 2023

作者您好，使用LLaMA-Factory项目对模型做sft和evaluate遇到问题

sft时报错
"modeling_qwen_yarn.py", line 651, in forward
layernorm_input = attn_output + residual
~~~~~~~~~~~~^~~~~~~~~~
RuntimeError: The size of tensor a (1006) must match the size of tensor b (1008)
at non-singleton dimension 1
0%| | 0/1624 [00:16<?, ?it/s]
[E ProcessGroupNCCL.cpp:916] [Rank 0] NCCL watchdog thread terminated with exception: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.

evaluate时报错
"modeling_qwen_yarn.py", line 176, in pad_input
output[indices] = hidden_states
~~~~~~^^^^^^^^^
RuntimeError: shape mismatch: value tensor of shape [2768, 40, 128] cannot be broadcast to indexing result of shape [2679, 40, 128]

yuyijiong

Owner Dec 26, 2023

请勿padding到max_length。只能padding到当前batch最长sample的长度。

charry2000

Dec 28, 2023

发现是padding的问题
LLaMA-Factory有padding到8的倍数Token
evaluate时batch_size 设置成1就OK

charry2000 changed discussion status to closed Dec 28, 2023