zen-E commited on
Commit
6721647
1 Parent(s): 4938660

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +68 -0
README.md ADDED
@@ -0,0 +1,68 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ datasets:
3
+ - anli
4
+ - zen-E/ANLI-simcse-roberta-large-embeddings-pca-256
5
+ language:
6
+ - en
7
+ metrics:
8
+ - spearmanr
9
+ - pearsonr
10
+ library_name: transformers
11
+ ---
12
+
13
+ The model is trained by knowledge distillation between the "princeton-nlp/unsup-simcse-roberta-large" and "zen-E/bert-mini-sentence-distil-unsupervised" on the 'ANLI'.
14
+
15
+ The model can perform inferencing by Automodel.
16
+
17
+ The model achieves 0.836 and 0.840 for pearsonr and spearmanr respectively on STS-b test dataset.
18
+
19
+ For more training detail, the training config and the pytorch forward function is as follows. The teacher's fearure is first transform to a size of 256 by the PCA object in "zen-E/bert-mini-sentence-distil-unsupervised" which can be loaded by:
20
+
21
+ ```python
22
+ import joblib
23
+ pca = joblib.load('ANLI-simcse-roberta-large-embeddings-pca-256/pca_model.sav')
24
+ features_256 = pca.transform(features)
25
+ ```
26
+
27
+ ```python
28
+ config = {
29
+ 'epoch' = 10,
30
+ 'learning_rate' = 5e-5,
31
+ 'batch_size' = 512,
32
+ 'temperature' = 0.05
33
+ }
34
+ ```
35
+
36
+ ```python
37
+ def forward_cos_mse_kd(self, sentence1s, sentence2s, sentence3s, teacher_sentence1_embs, teacher_sentence2_embs, teacher_sentence3_embs):
38
+ """forward function for the ANLI dataset"""
39
+ _, o1 = self.bert(**sentence1s)
40
+ _, o2 = self.bert(**sentence2s)
41
+ _, o3 = self.bert(**sentence3s)
42
+
43
+ # compute student's cosine similarity between sentences
44
+ cos_o1_o2 = cosine_sim(o1, o2)
45
+ cos_o1_o3 = cosine_sim(o1, o3)
46
+
47
+ # compute teacher's cosine similarity between sentences
48
+ cos_o1_o2_t = cosine_sim(teacher_sentence1_embs, teacher_sentence2_embs)
49
+ cos_o1_o3_t = cosine_sim(teacher_sentence1_embs, teacher_sentence3_embs)
50
+
51
+ cos_sim = torch.cat((cos_o1_o2, cos_o1_o3), dim=-1)
52
+ cos_sim_t = torch.cat((cos_o1_o2_t, cos_o1_o3_t), dim=-1)
53
+
54
+ # KL Divergence between student and teacher probabilities
55
+ soft_teacher_probs = F.softmax(cos_sim_t / self.temperature, dim=1)
56
+ kd_cos_loss = F.kl_div(F.log_softmax(cos_sim / self.temperature, dim=1),
57
+ soft_teacher_probs,
58
+ reduction='batchmean')
59
+
60
+ # mse loss
61
+ o = torch.cat([o1, o2, o3], dim=0)
62
+ teacher_embs = torch.cat([teacher_sentence1_embs, teacher_sentence2_embs, teacher_sentence3_embs], dim=0)
63
+ kd_mse_loss = nn.MSELoss()(o, teacher_embs)/3
64
+
65
+ # equal weight for the two losses
66
+ total_loss = kd_cos_loss*0.5+kd_mse_loss*0.5
67
+ return total_loss, kd_cos_loss, kd_mse_loss
68
+ ```