用LLaMA-Factory项目做sft和evaluate时报错
作者您好,使用LLaMA-Factory项目对模型做sft和evaluate遇到问题
sft时报错
"modeling_qwen_yarn.py", line 651, in forward
layernorm_input = attn_output + residual
~~~~~~~~~~~~^~~~~~~~~~
RuntimeError: The size of tensor a (1006) must match the size of tensor b (1008)
at non-singleton dimension 1
0%| | 0/1624 [00:16<?, ?it/s]
[E ProcessGroupNCCL.cpp:916] [Rank 0] NCCL watchdog thread terminated with exception: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
evaluate时报错
"modeling_qwen_yarn.py", line 176, in pad_input
output[indices] = hidden_states
~~~~~~^^^^^^^^^
RuntimeError: shape mismatch: value tensor of shape [2768, 40, 128] cannot be broadcast to indexing result of shape [2679, 40, 128]
请勿padding到max_length。只能padding到当前batch最长sample的长度。
发现是padding的问题
LLaMA-Factory有padding到8的倍数Token
evaluate时batch_size 设置成1就OK