几个疑问
1、LongBench测试结果,其他4位是zero-shot吧?这个模型是做了sft的,对比是不是有点不公平,应该拿第2步预训练后的模型对比吧?
2、可以基于这个模型,继续做lora训练吗?
3、第3步微调后的权重能单独拎出来吗,即第2步预训练的模型+微调权重
1.参与对比的所有模型都是经过sft的chat模型,且测试样本大部分都不大于其上下文长度。
2.可以继续lora
3.有单独的权重,但是目前版本预训练其实并不充分,几乎和原版Qwen-14b效果一样。指令微调更重要。
感谢大佬回复。
我是用swift【https://github.com/modelscope/swift】微调的,报下面这个错误,能帮忙看下吗
0%| | 0/2160 [00:00<?, ?it/s]use_cache=True
is incompatible with gradient checkpointing. Setting use_cache=False
...
Traceback (most recent call last):
File "/mnt/zkzhu/zzk/llm_sft/zzk_swift_llm/llm_sft.py", line 10, in
output = sft_main()
File "/mnt/zkzhu/zzk/llm_sft/swift-main/swift/utils/run_utils.py", line 27, in x_main
return llm_x(args, **kwargs)
File "/mnt/zkzhu/zzk/llm_sft/swift-main/swift/llm/sft.py", line 304, in llm_sft
trainer.train(training_args.resume_from_checkpoint)
File "/mnt/zkzhu/miniconda3/envs/pt2.0/lib/python3.9/site-packages/transformers/trainer.py", line 1555, in train
return inner_training_loop(
File "/mnt/zkzhu/miniconda3/envs/pt2.0/lib/python3.9/site-packages/transformers/trainer.py", line 1860, in _inner_training_loop
tr_loss_step = self.training_step(model, inputs)
File "/mnt/zkzhu/zzk/llm_sft/swift-main/swift/trainers/trainers.py", line 48, in training_step
training_output = super().training_step(*args, **kwargs)
File "/mnt/zkzhu/miniconda3/envs/pt2.0/lib/python3.9/site-packages/transformers/trainer.py", line 2725, in training_step
loss = self.compute_loss(model, inputs)
File "/mnt/zkzhu/zzk/llm_sft/swift-main/swift/trainers/trainers.py", line 185, in compute_loss
preds = outputs.logits.argmax(dim=2)[..., :-1]
AttributeError: 'NoneType' object has no attribute 'argmax'
0%| | 0/2160 [00:01<?, ?it/s]
找到问题了,源码中
1159-1160行需要注释掉:
if self.training:
lm_logits=None