infgrad's picture
Update README.md
c2d4fc7 verified
|
raw
history blame
37 kB
metadata
pipeline_tag: sentence-similarity
tags:
  - sentence-transformers
  - feature-extraction
  - sentence-similarity
  - mteb
model-index:
  - name: stella-large-zh-v3-1792d
    results:
      - task:
          type: STS
        dataset:
          type: C-MTEB/AFQMC
          name: MTEB AFQMC
          config: default
          split: validation
          revision: None
        metrics:
          - type: cos_sim_pearson
            value: 54.48093298255762
          - type: cos_sim_spearman
            value: 59.105354109068685
          - type: euclidean_pearson
            value: 57.761189988643444
          - type: euclidean_spearman
            value: 59.10537421115596
          - type: manhattan_pearson
            value: 56.94359297051431
          - type: manhattan_spearman
            value: 58.37611109821567
      - task:
          type: STS
        dataset:
          type: C-MTEB/ATEC
          name: MTEB ATEC
          config: default
          split: test
          revision: None
        metrics:
          - type: cos_sim_pearson
            value: 54.39711127600595
          - type: cos_sim_spearman
            value: 58.190191920824454
          - type: euclidean_pearson
            value: 61.80082379352729
          - type: euclidean_spearman
            value: 58.19018966860797
          - type: manhattan_pearson
            value: 60.927601060396206
          - type: manhattan_spearman
            value: 57.78832902694192
      - task:
          type: Classification
        dataset:
          type: mteb/amazon_reviews_multi
          name: MTEB AmazonReviewsClassification (zh)
          config: zh
          split: test
          revision: 1399c76144fd37290681b995c656ef9b2e06e26d
        metrics:
          - type: accuracy
            value: 46.31600000000001
          - type: f1
            value: 44.45281663598873
      - task:
          type: STS
        dataset:
          type: C-MTEB/BQ
          name: MTEB BQ
          config: default
          split: test
          revision: None
        metrics:
          - type: cos_sim_pearson
            value: 69.12211326097868
          - type: cos_sim_spearman
            value: 71.0741302039443
          - type: euclidean_pearson
            value: 69.89070483887852
          - type: euclidean_spearman
            value: 71.07413020351787
          - type: manhattan_pearson
            value: 69.62345441260962
          - type: manhattan_spearman
            value: 70.8517591280618
      - task:
          type: Clustering
        dataset:
          type: C-MTEB/CLSClusteringP2P
          name: MTEB CLSClusteringP2P
          config: default
          split: test
          revision: None
        metrics:
          - type: v_measure
            value: 41.937723608805314
      - task:
          type: Clustering
        dataset:
          type: C-MTEB/CLSClusteringS2S
          name: MTEB CLSClusteringS2S
          config: default
          split: test
          revision: None
        metrics:
          - type: v_measure
            value: 40.34373057675427
      - task:
          type: Reranking
        dataset:
          type: C-MTEB/CMedQAv1-reranking
          name: MTEB CMedQAv1
          config: default
          split: test
          revision: None
        metrics:
          - type: map
            value: 88.98896401788376
          - type: mrr
            value: 90.97119047619047
      - task:
          type: Reranking
        dataset:
          type: C-MTEB/CMedQAv2-reranking
          name: MTEB CMedQAv2
          config: default
          split: test
          revision: None
        metrics:
          - type: map
            value: 89.59718540244556
          - type: mrr
            value: 91.41246031746032
      - task:
          type: Retrieval
        dataset:
          type: C-MTEB/CmedqaRetrieval
          name: MTEB CmedqaRetrieval
          config: default
          split: dev
          revision: None
        metrics:
          - type: map_at_1
            value: 26.954
          - type: map_at_10
            value: 40.144999999999996
          - type: map_at_100
            value: 42.083999999999996
          - type: map_at_1000
            value: 42.181000000000004
          - type: map_at_3
            value: 35.709
          - type: map_at_5
            value: 38.141000000000005
          - type: mrr_at_1
            value: 40.71
          - type: mrr_at_10
            value: 48.93
          - type: mrr_at_100
            value: 49.921
          - type: mrr_at_1000
            value: 49.958999999999996
          - type: mrr_at_3
            value: 46.32
          - type: mrr_at_5
            value: 47.769
          - type: ndcg_at_1
            value: 40.71
          - type: ndcg_at_10
            value: 46.869
          - type: ndcg_at_100
            value: 54.234
          - type: ndcg_at_1000
            value: 55.854000000000006
          - type: ndcg_at_3
            value: 41.339
          - type: ndcg_at_5
            value: 43.594
          - type: precision_at_1
            value: 40.71
          - type: precision_at_10
            value: 10.408000000000001
          - type: precision_at_100
            value: 1.635
          - type: precision_at_1000
            value: 0.184
          - type: precision_at_3
            value: 23.348
          - type: precision_at_5
            value: 16.929
          - type: recall_at_1
            value: 26.954
          - type: recall_at_10
            value: 57.821999999999996
          - type: recall_at_100
            value: 88.08200000000001
          - type: recall_at_1000
            value: 98.83800000000001
          - type: recall_at_3
            value: 41.221999999999994
          - type: recall_at_5
            value: 48.241
      - task:
          type: PairClassification
        dataset:
          type: C-MTEB/CMNLI
          name: MTEB Cmnli
          config: default
          split: validation
          revision: None
        metrics:
          - type: cos_sim_accuracy
            value: 83.6680697534576
          - type: cos_sim_ap
            value: 90.77401562455269
          - type: cos_sim_f1
            value: 84.68266427450101
          - type: cos_sim_precision
            value: 81.36177547942253
          - type: cos_sim_recall
            value: 88.28618190320317
          - type: dot_accuracy
            value: 83.6680697534576
          - type: dot_ap
            value: 90.76429465198817
          - type: dot_f1
            value: 84.68266427450101
          - type: dot_precision
            value: 81.36177547942253
          - type: dot_recall
            value: 88.28618190320317
          - type: euclidean_accuracy
            value: 83.6680697534576
          - type: euclidean_ap
            value: 90.77401909305344
          - type: euclidean_f1
            value: 84.68266427450101
          - type: euclidean_precision
            value: 81.36177547942253
          - type: euclidean_recall
            value: 88.28618190320317
          - type: manhattan_accuracy
            value: 83.40348767288035
          - type: manhattan_ap
            value: 90.57002020310819
          - type: manhattan_f1
            value: 84.51526032315978
          - type: manhattan_precision
            value: 81.25134843581445
          - type: manhattan_recall
            value: 88.05237315875614
          - type: max_accuracy
            value: 83.6680697534576
          - type: max_ap
            value: 90.77401909305344
          - type: max_f1
            value: 84.68266427450101
      - task:
          type: Retrieval
        dataset:
          type: C-MTEB/CovidRetrieval
          name: MTEB CovidRetrieval
          config: default
          split: dev
          revision: None
        metrics:
          - type: map_at_1
            value: 69.705
          - type: map_at_10
            value: 78.648
          - type: map_at_100
            value: 78.888
          - type: map_at_1000
            value: 78.89399999999999
          - type: map_at_3
            value: 77.151
          - type: map_at_5
            value: 77.98
          - type: mrr_at_1
            value: 69.863
          - type: mrr_at_10
            value: 78.62599999999999
          - type: mrr_at_100
            value: 78.861
          - type: mrr_at_1000
            value: 78.867
          - type: mrr_at_3
            value: 77.204
          - type: mrr_at_5
            value: 78.005
          - type: ndcg_at_1
            value: 69.968
          - type: ndcg_at_10
            value: 82.44399999999999
          - type: ndcg_at_100
            value: 83.499
          - type: ndcg_at_1000
            value: 83.647
          - type: ndcg_at_3
            value: 79.393
          - type: ndcg_at_5
            value: 80.855
          - type: precision_at_1
            value: 69.968
          - type: precision_at_10
            value: 9.515
          - type: precision_at_100
            value: 0.9990000000000001
          - type: precision_at_1000
            value: 0.101
          - type: precision_at_3
            value: 28.802
          - type: precision_at_5
            value: 18.019
          - type: recall_at_1
            value: 69.705
          - type: recall_at_10
            value: 94.152
          - type: recall_at_100
            value: 98.84100000000001
          - type: recall_at_1000
            value: 100
          - type: recall_at_3
            value: 85.774
          - type: recall_at_5
            value: 89.252
      - task:
          type: Retrieval
        dataset:
          type: C-MTEB/DuRetrieval
          name: MTEB DuRetrieval
          config: default
          split: dev
          revision: None
        metrics:
          - type: map_at_1
            value: 25.88
          - type: map_at_10
            value: 79.857
          - type: map_at_100
            value: 82.636
          - type: map_at_1000
            value: 82.672
          - type: map_at_3
            value: 55.184
          - type: map_at_5
            value: 70.009
          - type: mrr_at_1
            value: 89.64999999999999
          - type: mrr_at_10
            value: 92.967
          - type: mrr_at_100
            value: 93.039
          - type: mrr_at_1000
            value: 93.041
          - type: mrr_at_3
            value: 92.65
          - type: mrr_at_5
            value: 92.86
          - type: ndcg_at_1
            value: 89.64999999999999
          - type: ndcg_at_10
            value: 87.126
          - type: ndcg_at_100
            value: 89.898
          - type: ndcg_at_1000
            value: 90.253
          - type: ndcg_at_3
            value: 86.012
          - type: ndcg_at_5
            value: 85.124
          - type: precision_at_1
            value: 89.64999999999999
          - type: precision_at_10
            value: 41.735
          - type: precision_at_100
            value: 4.797
          - type: precision_at_1000
            value: 0.488
          - type: precision_at_3
            value: 77.267
          - type: precision_at_5
            value: 65.48
          - type: recall_at_1
            value: 25.88
          - type: recall_at_10
            value: 88.28399999999999
          - type: recall_at_100
            value: 97.407
          - type: recall_at_1000
            value: 99.29299999999999
          - type: recall_at_3
            value: 57.38799999999999
          - type: recall_at_5
            value: 74.736
      - task:
          type: Retrieval
        dataset:
          type: C-MTEB/EcomRetrieval
          name: MTEB EcomRetrieval
          config: default
          split: dev
          revision: None
        metrics:
          - type: map_at_1
            value: 53.2
          - type: map_at_10
            value: 63.556000000000004
          - type: map_at_100
            value: 64.033
          - type: map_at_1000
            value: 64.044
          - type: map_at_3
            value: 60.983
          - type: map_at_5
            value: 62.588
          - type: mrr_at_1
            value: 53.2
          - type: mrr_at_10
            value: 63.556000000000004
          - type: mrr_at_100
            value: 64.033
          - type: mrr_at_1000
            value: 64.044
          - type: mrr_at_3
            value: 60.983
          - type: mrr_at_5
            value: 62.588
          - type: ndcg_at_1
            value: 53.2
          - type: ndcg_at_10
            value: 68.61699999999999
          - type: ndcg_at_100
            value: 70.88499999999999
          - type: ndcg_at_1000
            value: 71.15899999999999
          - type: ndcg_at_3
            value: 63.434000000000005
          - type: ndcg_at_5
            value: 66.301
          - type: precision_at_1
            value: 53.2
          - type: precision_at_10
            value: 8.450000000000001
          - type: precision_at_100
            value: 0.95
          - type: precision_at_1000
            value: 0.097
          - type: precision_at_3
            value: 23.5
          - type: precision_at_5
            value: 15.479999999999999
          - type: recall_at_1
            value: 53.2
          - type: recall_at_10
            value: 84.5
          - type: recall_at_100
            value: 95
          - type: recall_at_1000
            value: 97.1
          - type: recall_at_3
            value: 70.5
          - type: recall_at_5
            value: 77.4
      - task:
          type: Classification
        dataset:
          type: C-MTEB/IFlyTek-classification
          name: MTEB IFlyTek
          config: default
          split: validation
          revision: None
        metrics:
          - type: accuracy
            value: 50.63485956136976
          - type: f1
            value: 38.286307407751266
      - task:
          type: Classification
        dataset:
          type: C-MTEB/JDReview-classification
          name: MTEB JDReview
          config: default
          split: test
          revision: None
        metrics:
          - type: accuracy
            value: 86.11632270168855
          - type: ap
            value: 54.43932599806482
          - type: f1
            value: 80.85485110996076
      - task:
          type: STS
        dataset:
          type: C-MTEB/LCQMC
          name: MTEB LCQMC
          config: default
          split: test
          revision: None
        metrics:
          - type: cos_sim_pearson
            value: 72.47315152994804
          - type: cos_sim_spearman
            value: 78.26531600908152
          - type: euclidean_pearson
            value: 77.8560788714531
          - type: euclidean_spearman
            value: 78.26531157334841
          - type: manhattan_pearson
            value: 77.70593783974188
          - type: manhattan_spearman
            value: 78.13880812439999
      - task:
          type: Reranking
        dataset:
          type: C-MTEB/Mmarco-reranking
          name: MTEB MMarcoReranking
          config: default
          split: dev
          revision: None
        metrics:
          - type: map
            value: 28.088177976572222
          - type: mrr
            value: 27.125
      - task:
          type: Retrieval
        dataset:
          type: C-MTEB/MMarcoRetrieval
          name: MTEB MMarcoRetrieval
          config: default
          split: dev
          revision: None
        metrics:
          - type: map_at_1
            value: 66.428
          - type: map_at_10
            value: 75.5
          - type: map_at_100
            value: 75.82600000000001
          - type: map_at_1000
            value: 75.837
          - type: map_at_3
            value: 73.74300000000001
          - type: map_at_5
            value: 74.87
          - type: mrr_at_1
            value: 68.754
          - type: mrr_at_10
            value: 76.145
          - type: mrr_at_100
            value: 76.432
          - type: mrr_at_1000
            value: 76.442
          - type: mrr_at_3
            value: 74.628
          - type: mrr_at_5
            value: 75.612
          - type: ndcg_at_1
            value: 68.754
          - type: ndcg_at_10
            value: 79.144
          - type: ndcg_at_100
            value: 80.60199999999999
          - type: ndcg_at_1000
            value: 80.886
          - type: ndcg_at_3
            value: 75.81599999999999
          - type: ndcg_at_5
            value: 77.729
          - type: precision_at_1
            value: 68.754
          - type: precision_at_10
            value: 9.544
          - type: precision_at_100
            value: 1.026
          - type: precision_at_1000
            value: 0.105
          - type: precision_at_3
            value: 28.534
          - type: precision_at_5
            value: 18.138
          - type: recall_at_1
            value: 66.428
          - type: recall_at_10
            value: 89.716
          - type: recall_at_100
            value: 96.313
          - type: recall_at_1000
            value: 98.541
          - type: recall_at_3
            value: 80.923
          - type: recall_at_5
            value: 85.48
      - task:
          type: Classification
        dataset:
          type: mteb/amazon_massive_intent
          name: MTEB MassiveIntentClassification (zh-CN)
          config: zh-CN
          split: test
          revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7
        metrics:
          - type: accuracy
            value: 73.27841291190316
          - type: f1
            value: 70.65529957574735
      - task:
          type: Classification
        dataset:
          type: mteb/amazon_massive_scenario
          name: MTEB MassiveScenarioClassification (zh-CN)
          config: zh-CN
          split: test
          revision: 7d571f92784cd94a019292a1f45445077d0ef634
        metrics:
          - type: accuracy
            value: 76.30127774041695
          - type: f1
            value: 76.10358226518304
      - task:
          type: Retrieval
        dataset:
          type: C-MTEB/MedicalRetrieval
          name: MTEB MedicalRetrieval
          config: default
          split: dev
          revision: None
        metrics:
          - type: map_at_1
            value: 56.3
          - type: map_at_10
            value: 62.193
          - type: map_at_100
            value: 62.722
          - type: map_at_1000
            value: 62.765
          - type: map_at_3
            value: 60.633
          - type: map_at_5
            value: 61.617999999999995
          - type: mrr_at_1
            value: 56.3
          - type: mrr_at_10
            value: 62.193
          - type: mrr_at_100
            value: 62.722
          - type: mrr_at_1000
            value: 62.765
          - type: mrr_at_3
            value: 60.633
          - type: mrr_at_5
            value: 61.617999999999995
          - type: ndcg_at_1
            value: 56.3
          - type: ndcg_at_10
            value: 65.176
          - type: ndcg_at_100
            value: 67.989
          - type: ndcg_at_1000
            value: 69.219
          - type: ndcg_at_3
            value: 62.014
          - type: ndcg_at_5
            value: 63.766
          - type: precision_at_1
            value: 56.3
          - type: precision_at_10
            value: 7.46
          - type: precision_at_100
            value: 0.8829999999999999
          - type: precision_at_1000
            value: 0.098
          - type: precision_at_3
            value: 22
          - type: precision_at_5
            value: 14.04
          - type: recall_at_1
            value: 56.3
          - type: recall_at_10
            value: 74.6
          - type: recall_at_100
            value: 88.3
          - type: recall_at_1000
            value: 98.1
          - type: recall_at_3
            value: 66
          - type: recall_at_5
            value: 70.19999999999999
      - task:
          type: Classification
        dataset:
          type: C-MTEB/MultilingualSentiment-classification
          name: MTEB MultilingualSentiment
          config: default
          split: validation
          revision: None
        metrics:
          - type: accuracy
            value: 76.44666666666666
          - type: f1
            value: 76.34548655475949
      - task:
          type: PairClassification
        dataset:
          type: C-MTEB/OCNLI
          name: MTEB Ocnli
          config: default
          split: validation
          revision: None
        metrics:
          - type: cos_sim_accuracy
            value: 82.34975636166757
          - type: cos_sim_ap
            value: 85.44149338593267
          - type: cos_sim_f1
            value: 83.68654509610647
          - type: cos_sim_precision
            value: 78.46580406654344
          - type: cos_sim_recall
            value: 89.65153115100317
          - type: dot_accuracy
            value: 82.34975636166757
          - type: dot_ap
            value: 85.4415701376729
          - type: dot_f1
            value: 83.68654509610647
          - type: dot_precision
            value: 78.46580406654344
          - type: dot_recall
            value: 89.65153115100317
          - type: euclidean_accuracy
            value: 82.34975636166757
          - type: euclidean_ap
            value: 85.4415701376729
          - type: euclidean_f1
            value: 83.68654509610647
          - type: euclidean_precision
            value: 78.46580406654344
          - type: euclidean_recall
            value: 89.65153115100317
          - type: manhattan_accuracy
            value: 81.97076340010828
          - type: manhattan_ap
            value: 84.83614660756733
          - type: manhattan_f1
            value: 83.34167083541772
          - type: manhattan_precision
            value: 79.18250950570342
          - type: manhattan_recall
            value: 87.96198521647307
          - type: max_accuracy
            value: 82.34975636166757
          - type: max_ap
            value: 85.4415701376729
          - type: max_f1
            value: 83.68654509610647
      - task:
          type: Classification
        dataset:
          type: C-MTEB/OnlineShopping-classification
          name: MTEB OnlineShopping
          config: default
          split: test
          revision: None
        metrics:
          - type: accuracy
            value: 93.24
          - type: ap
            value: 91.3586656455605
          - type: f1
            value: 93.22999314249503
      - task:
          type: STS
        dataset:
          type: C-MTEB/PAWSX
          name: MTEB PAWSX
          config: default
          split: test
          revision: None
        metrics:
          - type: cos_sim_pearson
            value: 39.05676042449009
          - type: cos_sim_spearman
            value: 44.996534098358545
          - type: euclidean_pearson
            value: 44.42418609172825
          - type: euclidean_spearman
            value: 44.995941361058996
          - type: manhattan_pearson
            value: 43.98118203238076
          - type: manhattan_spearman
            value: 44.51414152788784
      - task:
          type: STS
        dataset:
          type: C-MTEB/QBQTC
          name: MTEB QBQTC
          config: default
          split: test
          revision: None
        metrics:
          - type: cos_sim_pearson
            value: 36.694269474438045
          - type: cos_sim_spearman
            value: 38.686738967031616
          - type: euclidean_pearson
            value: 36.822540068407235
          - type: euclidean_spearman
            value: 38.68690745429757
          - type: manhattan_pearson
            value: 36.77180703308932
          - type: manhattan_spearman
            value: 38.45414914148094
      - task:
          type: STS
        dataset:
          type: mteb/sts22-crosslingual-sts
          name: MTEB STS22 (zh)
          config: zh
          split: test
          revision: 6d1ba47164174a496b7fa5d3569dae26a6813b80
        metrics:
          - type: cos_sim_pearson
            value: 65.81209017614124
          - type: cos_sim_spearman
            value: 66.5255285833172
          - type: euclidean_pearson
            value: 66.01848701752732
          - type: euclidean_spearman
            value: 66.5255285833172
          - type: manhattan_pearson
            value: 66.66433676370542
          - type: manhattan_spearman
            value: 67.07086311480214
      - task:
          type: STS
        dataset:
          type: C-MTEB/STSB
          name: MTEB STSB
          config: default
          split: test
          revision: None
        metrics:
          - type: cos_sim_pearson
            value: 80.60785761283502
          - type: cos_sim_spearman
            value: 82.80278693241074
          - type: euclidean_pearson
            value: 82.47573315938638
          - type: euclidean_spearman
            value: 82.80290808593806
          - type: manhattan_pearson
            value: 82.49682028989669
          - type: manhattan_spearman
            value: 82.84565039346022
      - task:
          type: Reranking
        dataset:
          type: C-MTEB/T2Reranking
          name: MTEB T2Reranking
          config: default
          split: dev
          revision: None
        metrics:
          - type: map
            value: 66.37886004738723
          - type: mrr
            value: 76.08501655006394
      - task:
          type: Retrieval
        dataset:
          type: C-MTEB/T2Retrieval
          name: MTEB T2Retrieval
          config: default
          split: dev
          revision: None
        metrics:
          - type: map_at_1
            value: 28.102
          - type: map_at_10
            value: 78.071
          - type: map_at_100
            value: 81.71000000000001
          - type: map_at_1000
            value: 81.773
          - type: map_at_3
            value: 55.142
          - type: map_at_5
            value: 67.669
          - type: mrr_at_1
            value: 90.9
          - type: mrr_at_10
            value: 93.29499999999999
          - type: mrr_at_100
            value: 93.377
          - type: mrr_at_1000
            value: 93.379
          - type: mrr_at_3
            value: 92.901
          - type: mrr_at_5
            value: 93.152
          - type: ndcg_at_1
            value: 90.9
          - type: ndcg_at_10
            value: 85.564
          - type: ndcg_at_100
            value: 89.11200000000001
          - type: ndcg_at_1000
            value: 89.693
          - type: ndcg_at_3
            value: 87.024
          - type: ndcg_at_5
            value: 85.66
          - type: precision_at_1
            value: 90.9
          - type: precision_at_10
            value: 42.208
          - type: precision_at_100
            value: 5.027
          - type: precision_at_1000
            value: 0.517
          - type: precision_at_3
            value: 75.872
          - type: precision_at_5
            value: 63.566
          - type: recall_at_1
            value: 28.102
          - type: recall_at_10
            value: 84.44500000000001
          - type: recall_at_100
            value: 95.91300000000001
          - type: recall_at_1000
            value: 98.80799999999999
          - type: recall_at_3
            value: 56.772999999999996
          - type: recall_at_5
            value: 70.99499999999999
      - task:
          type: Classification
        dataset:
          type: C-MTEB/TNews-classification
          name: MTEB TNews
          config: default
          split: validation
          revision: None
        metrics:
          - type: accuracy
            value: 53.10599999999999
          - type: f1
            value: 51.40415523558322
      - task:
          type: Clustering
        dataset:
          type: C-MTEB/ThuNewsClusteringP2P
          name: MTEB ThuNewsClusteringP2P
          config: default
          split: test
          revision: None
        metrics:
          - type: v_measure
            value: 69.6145576098232
      - task:
          type: Clustering
        dataset:
          type: C-MTEB/ThuNewsClusteringS2S
          name: MTEB ThuNewsClusteringS2S
          config: default
          split: test
          revision: None
        metrics:
          - type: v_measure
            value: 63.7129548775017
      - task:
          type: Retrieval
        dataset:
          type: C-MTEB/VideoRetrieval
          name: MTEB VideoRetrieval
          config: default
          split: dev
          revision: None
        metrics:
          - type: map_at_1
            value: 60.199999999999996
          - type: map_at_10
            value: 69.724
          - type: map_at_100
            value: 70.185
          - type: map_at_1000
            value: 70.196
          - type: map_at_3
            value: 67.95
          - type: map_at_5
            value: 69.155
          - type: mrr_at_1
            value: 60.199999999999996
          - type: mrr_at_10
            value: 69.724
          - type: mrr_at_100
            value: 70.185
          - type: mrr_at_1000
            value: 70.196
          - type: mrr_at_3
            value: 67.95
          - type: mrr_at_5
            value: 69.155
          - type: ndcg_at_1
            value: 60.199999999999996
          - type: ndcg_at_10
            value: 73.888
          - type: ndcg_at_100
            value: 76.02799999999999
          - type: ndcg_at_1000
            value: 76.344
          - type: ndcg_at_3
            value: 70.384
          - type: ndcg_at_5
            value: 72.541
          - type: precision_at_1
            value: 60.199999999999996
          - type: precision_at_10
            value: 8.67
          - type: precision_at_100
            value: 0.9650000000000001
          - type: precision_at_1000
            value: 0.099
          - type: precision_at_3
            value: 25.8
          - type: precision_at_5
            value: 16.520000000000003
          - type: recall_at_1
            value: 60.199999999999996
          - type: recall_at_10
            value: 86.7
          - type: recall_at_100
            value: 96.5
          - type: recall_at_1000
            value: 99
          - type: recall_at_3
            value: 77.4
          - type: recall_at_5
            value: 82.6
      - task:
          type: Classification
        dataset:
          type: C-MTEB/waimai-classification
          name: MTEB Waimai
          config: default
          split: test
          revision: None
        metrics:
          - type: accuracy
            value: 88.08
          - type: ap
            value: 72.66435456846166
          - type: f1
            value: 86.55995793551286

1 开源清单

本次开源2个通用向量编码模型和一个针对dialogue进行编码的向量模型,同时开源全量160万对话重写数据集和20万的难负例的检索数据集。

开源模型:

ModelName ModelSize MaxTokens EmbeddingDimensions Language Scenario C-MTEB Score
infgrad/stella-base-zh-v3-1792d 0.4GB 512 1792 zh-CN 通用文本 67.96
infgrad/stella-large-zh-v3-1792d 1.3GB 512 1792 zh-CN 通用文本 68.48
infgrad/stella-dialogue-large-zh-v3-1792d 1.3GB 512 1792 zh-CN 对话文本 不适用

开源数据:

  1. 全量对话重写数据集 约160万
  2. 部分带有难负例的检索数据集 约20万

上述数据集均使用LLM构造,欢迎各位贡献数据集。

2 使用方法

2.1 通用编码模型使用方法

直接SentenceTransformer加载即可:

from sentence_transformers import SentenceTransformer

model = SentenceTransformer("infgrad/stella-base-zh-v3-1792d")
# model = SentenceTransformer("infgrad/stella-large-zh-v3-1792d")
vectors = model.encode(["text1", "text2"])

2.2 dialogue编码模型使用方法

使用场景: 在一段对话中,需要根据用户语句去检索相关文本,但是对话中的用户语句存在大量的指代和省略,导致直接使用通用编码模型效果不好, 可以使用本项目的专门的dialogue编码模型进行编码

使用要点:

  1. 对dialogue进行编码时,dialogue中的每个utterance需要是如下格式:"{ROLE}: {TEXT}",然后使用[SEP] join一下
  2. 整个对话都要送入模型进行编码,如果长度不够就删掉早期的对话,编码后的向量本质是对话中最后一句话的重写版本的向量!!
  3. 对话用stella-dialogue-large-zh-v3-1792d编码,被检索文本使用stella-large-zh-v3-1792d进行编码,所以本场景是需要2个编码模型的

如果对使用方法还有疑惑,请到下面章节阅读该模型是如何训练的。

使用示例:

from sentence_transformers import SentenceTransformer

dial_model = SentenceTransformer("infgrad/stella-dialogue-large-zh-v3-1792d")
general_model = SentenceTransformer("infgrad/stella-large-zh-v3-1792d")
# dialogue = ["张三: 吃饭吗", "李四: 等会去"]
dialogue = ["A: 最近去打篮球了吗", "B: 没有"]
corpus = ["B没打篮球是因为受伤了。", "B没有打乒乓球"]
last_utterance_vector = dial_model.encode(["[SEP]".join(dialogue)], normalize_embeddings=True)
corpus_vectors = general_model.encode(corpus, normalize_embeddings=True)
# 计算相似度
sims = (last_utterance_vector * corpus_vectors).sum(axis=1)
print(sims)

3 通用编码模型训练技巧分享

hard negative

难负例挖掘也是个经典的trick了,几乎总能提升效果

dropout-1d

dropout已经是深度学习的标配,我们可以稍微改造下使其更适合句向量的训练。 我们在训练时会尝试让每一个token-embedding都可以表征整个句子,而在推理时使用mean_pooling从而达到类似模型融合的效果。 具体操作是在mean_pooling时加入dropout_1d,torch代码如下:

vector_dropout = nn.Dropout1d(0.3)  # 算力有限,试了0.3和0.5 两个参数,其中0.3更优
last_hidden_state = bert_model(...)[0]
last_hidden = last_hidden_state.masked_fill(~attention_mask[..., None].bool(), 0.0)
last_hidden = vector_dropout(last_hidden)
vectors = last_hidden.sum(dim=1) / attention_mask.sum(dim=1)[..., None]

4 dialogue编码模型细节

4.1 为什么需要一个dialogue编码模型?

参见本人历史文章:https://www.zhihu.com/pin/1674913544847077376

4.2 训练数据

单条数据示例:

{
  "dialogue": [
    "A: 最近去打篮球了吗",
    "B: 没有"
  ],
  "last_utterance_rewrite": "B: 我最近没有去打篮球"
}

4.3 训练Loss

loss = cosine_loss( dial_model.encode(dialogue), existing_model.encode(last_utterance_rewrite) )

dial_model就是要被训练的模型,本人是以stella-large-zh-v3-1792d作为base-model进行继续训练的

existing_model就是现有训练好的通用编码模型,本人使用的是stella-large-zh-v3-1792d

已开源dialogue-embedding的全量训练数据,理论上可以复现本模型效果。

Loss下降情况:

icon

4.4 效果

目前还没有专门测试集,本人简单测试了下是有效果的,部分测试结果见文件dial_retrieval_test.xlsx

5 后续TODO

  1. 更多的dial-rewrite数据
  2. 不同EmbeddingDimensions的编码模型

6 FAQ

Q: 为什么向量维度是1792?
A: 最初考虑发布768、1024,768+768,1024+1024,1024+768维度,但是时间有限,先做了1792就只发布1792维度的模型。理论上维度越高效果越好。

Q: 如何复现CMTEB效果?
A: SentenceTransformer加载后直接用官方评测脚本就行,注意对于Classification任务向量需要先normalize一下

Q: 复现的CMTEB效果和本文不一致?
A: 聚类不一致正常,官方评测代码没有设定seed,其他不一致建议检查代码或联系本人。

Q: 如何选择向量模型?
A: 没有免费的午餐,在自己测试集上试试,本人推荐bge、e5和stella.

Q: 长度为什么只有512,能否更长?
A: 可以但没必要,长了效果普遍不好,这是当前训练方法和数据导致的,几乎无解,建议长文本还是走分块。

Q: 训练资源和算力?
A: 亿级别的数据,单卡A100要一个月起步