bobox's picture
KL divergence loss layers selfdistill....Multi step multi task training.
a232ba1 verified
raw
history blame contribute delete
229 Bytes
[
{
"idx": 0,
"name": "0",
"path": "",
"type": "sentence_transformers.models.Transformer"
},
{
"idx": 1,
"name": "1",
"path": "1_Pooling",
"type": "sentence_transformers.models.Pooling"
}
]