Yinka
Yinka embedding 模型是在开原模型stella-v3.5-mrl上续训的,采用了piccolo2提到的多任务混合损失(multi-task hybrid loss training)。同样本模型也支持了可变的向量维度。
使用方法
该模型的使用方法同stella-v3.5-mrl一样, 无需任何前缀。
from sentence_transformers import SentenceTransformer
from sklearn.preprocessing import normalize
model = SentenceTransformer("Classical/Yinka")
# 注意先不要normalize! 选取前n维后再normalize
vectors = model.encode(["text1", "text2"], normalize_embeddings=False)
print(vectors.shape) # shape is [2,1792]
n_dims = 768
cut_vecs = normalize(vectors[:, :n_dims])
结果
Model Name | Model Size (GB) | Dimension | Sequence Length | Classification (9) | Clustering (4) | Pair Classification (2) | Reranking (4) | Retrieval (8) | STS (8) | Average (35) |
---|---|---|---|---|---|---|---|---|---|---|
Yinka | 1.21 | 1792 | 512 | 74.30 | 61.99 | 89.87 | 69.77 | 74.40 | 63.30 | 70.79 |
stella-v3.5-mrl | 1.21 | 1792 | 512 | 71.56 | 54.39 | 88.09 | 68.45 | 73.51 | 62.48 | 68.56 |
piccolo-large-zh-v2 | 1.21 | 1792 | 512 | 74.59 | 62.17 | 90.24 | 70 | 74.36 | 63.5 | 70.95 |
训练细节
TODO
Licence
本模型采用MIT licence.
- Downloads last month
- 834
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.
Model tree for Classical/Yinka
Spaces using Classical/Yinka 2
Evaluation results
- cos_sim_pearson on MTEB AFQMCvalidation set self-reported56.306
- cos_sim_spearman on MTEB AFQMCvalidation set self-reported61.020
- euclidean_pearson on MTEB AFQMCvalidation set self-reported58.618
- euclidean_spearman on MTEB AFQMCvalidation set self-reported60.131
- manhattan_pearson on MTEB AFQMCvalidation set self-reported58.619
- manhattan_spearman on MTEB AFQMCvalidation set self-reported60.126
- cos_sim_pearson on MTEB ATECtest set self-reported55.861
- cos_sim_spearman on MTEB ATECtest set self-reported59.020
- euclidean_pearson on MTEB ATECtest set self-reported62.028
- euclidean_spearman on MTEB ATECtest set self-reported58.605