权重与original权重有区别?
#33
by
HennyWong
- opened
我尝试用利用torch加载的方法更改代码。然后对比了一下权重,发现好像和原生权重有区别?
@HennyWong Hi,这里没有很明白你的意思。是说权重参数有区别么?我们开发的时候,设定过参数对比,和 模型每层 transformer 的输出,对于同一个数值输入,每层的 transformer 数值输出和 torch 版本是可以对齐的。这意味着参数应该是对齐的。你可以贴下你的改动片段么?
您好:感谢您的回复。我主要作了如下的片段修改,不知道是否有遗漏?
from transformers import AutoModel
torch_model = AutoModel.from_pretrained(ckpt_path, trust_remote_code=True).half().cuda()torch_model
w.extend([load_to_torch(torch_model.transformer.layers[i].input_layernorm.weight, is_load(i))
for i in range(self.layer_num)])
w.extend([load_to_torch(torch_model.transformer.layers[i].input_layernorm.bias, is_load(i))
for i in range(self.layer_num)])
w.extend([load_to_torch(torch_model.transformer.layers[i].attention.query_key_value.weight, is_load(i))
for i in range(self.layer_num)])
w.extend([load_to_torch(torch_model.transformer.layers[i].attention.query_key_value.bias, is_load(i))
for i in range(self.layer_num)])
w.extend([load_to_torch(torch_model.transformer.layers[i].attention.dense.weight, is_load(i))
for i in range(self.layer_num)])
w.extend([load_to_torch(torch_model.transformer.layers[i].attention.dense.bias, is_load(i))
for i in range(self.layer_num)])
w.extend([load_to_torch(torch_model.transformer.layers[i].post_attention_layernorm.weight, is_load(i))
for i in range(self.layer_num)])
w.extend([load_to_torch(torch_model.transformer.layers[i].post_attention_layernorm.bias, is_load(i))
for i in range(self.layer_num)])
w.extend([load_to_torch(torch_model.transformer.layers[i].mlp.dense_h_to_4h.weight, is_load(i))
for i in range(self.layer_num)])
w.extend([load_to_torch(torch_model.transformer.layers[i].mlp.dense_h_to_4h.bias, is_load(i))
for i in range(self.layer_num)])
w.extend([load_to_torch(torch_model.transformer.layers[i].mlp.dense_4h_to_h.weight, is_load(i))
for i in range(self.layer_num)])
w.extend([load_to_torch(torch_model.transformer.layers[i].mlp.dense_4h_to_h.bias, is_load(i))
for i in range(self.layer_num)])
if self.has_pre_decoder_layernorm:
w.append(load_to_torch(torch_model.transformer.pre_decoder_layernorm.weight, True))
w.append(load_to_torch(torch_model.transformer.pre_decoder_layernorm.bias, True))
if self.has_post_decoder_layernorm:
w.append(load_to_torch(torch_model.transformer.final_layernorm.weight, True))
w.append(load_to_torch(torch_model.transformer.final_layernorm.bias, True))
if self.has_positional_encoding:
wpe = load_to_torch(f"model.wpe", True).reshape(-1, self.global_hidden_units)
assert self.max_seq_len <= wpe.size(0), (
f"max_seq_len ({self.max_seq_len} must not exceed "
f"the value of maximum sequence length during training ({wpe.size(0)})."
)
w.append(wpe)
w.append(load_to_torch(torch_model.transformer.word_embeddings.weight, True))
最后对比了embedding层的权重,感觉和hf 下载的版本有点区别。可以帮忙看看吗?
主要针对model.py里面257-311行里的修改