distily_multi_attn_experiment_ortho / benchmarks.shelve.bak
lapp0's picture
End of training
8e782e8 verified
raw
history blame contribute delete
248 Bytes
'teacher', (0, 26029753)
'attn_layer_mapper=layer-2, attn_loss_fn=raw_mse, attn_projector=orthogonal, attn_weight=25.0', (26030080, 26029753)
'attn_layer_mapper=all, attn_loss_fn=cos, attn_projector=orthogonal, attn_weight=5', (52060160, 26029753)