studio-ousia
/

mluke-base

named entity recognition

relation classification

question answering

Inference Endpoints

Model card Files Files and versions Community

ryo0634 commited on Jun 16, 2023

Commit

a3bacf3

•

1 Parent(s): cc5e9e8

Update README.md

Files changed (1) hide show

README.md +15 -0

README.md CHANGED Viewed

@@ -45,6 +45,21 @@ This is the mLUKE base model with 12 hidden layers, 768 hidden size. The total n
 of parameters in this model is 585M (278M for the word embeddings and encoder, 307M for the entity embeddings).
 The model was initialized with the weights of XLM-RoBERTa(base) and trained using December 2020 version of Wikipedia in 24 languages.
 ### Citation
 If you find mLUKE useful for your work, please cite the following paper:

 of parameters in this model is 585M (278M for the word embeddings and encoder, 307M for the entity embeddings).
 The model was initialized with the weights of XLM-RoBERTa(base) and trained using December 2020 version of Wikipedia in 24 languages.
+## Note
+When you load the model from `AutoModel.from_pretrained` with the default configuration, you will see the following warning:
+```
+Some weights of the model checkpoint at studio-ousia/mluke-base-lite were not used when initializing LukeModel: [
+'luke.encoder.layer.0.attention.self.w2e_query.weight', 'luke.encoder.layer.0.attention.self.w2e_query.bias',
+'luke.encoder.layer.0.attention.self.e2w_query.weight', 'luke.encoder.layer.0.attention.self.e2w_query.bias',
+'luke.encoder.layer.0.attention.self.e2e_query.weight', 'luke.encoder.layer.0.attention.self.e2e_query.bias',
+...]
+```
+These weights are the weights for entity-aware attention (as described in [the LUKE paper](https://arxiv.org/abs/2010.01057)).
+This is expected because `use_entity_aware_attention` is set to `false` by default, but the pretrained weights contain the weights for it in case you enable `use_entity_aware_attention` and have the weights loaded into the model.
 ### Citation
 If you find mLUKE useful for your work, please cite the following paper: