dragonkue's picture
Create README.md
4356376 verified
|
raw
history blame
13.2 kB
metadata
license: apache-2.0
language:
  - ko
  - en
metrics:
  - accuracy
base_model:
  - BAAI/bge-reranker-v2-m3
pipeline_tag: text-classification
library_name: sentence-transformers

Reranker (Cross-Encoder)

Different from embedding model, reranker uses question and document as input and directly output similarity instead of embedding. You can get a relevance score by inputting query and passage to the reranker. And the score can be mapped to a float value in [0,1] by sigmoid function.

Model Details

  • Base model : BAAI/bge-reranker-v2-m3
  • The multilingual model has been optimized for Korean.

Usage with Transformers

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

model = AutoModelForSequenceClassification.from_pretrained('dragonkue/bge-reranker-v2-m3-ko')
tokenizer = AutoTokenizer.from_pretrained('dragonkue/bge-reranker-v2-m3-ko')

features = tokenizer(['λͺ‡ 년도에 μ§€λ°©μ„Έμ™Έμˆ˜μž…λ²•μ΄ μ‹œν–‰λμ„κΉŒ?', 'μ‹€λ¬΄κ΅μœ‘μ„ 톡해 β€˜μ§€λ°©μ„Έμ™Έμˆ˜μž…λ²•β€™μ— λŒ€ν•œ μžμΉ˜λ‹¨μ²΄μ˜ 관심을 μ œκ³ ν•˜κ³  μžμΉ˜λ‹¨μ²΄μ˜ 차질 μ—†λŠ” 업무 좔진을 μ§€μ›ν•˜μ˜€λ‹€. μ΄λŸ¬ν•œ 쀀비과정을 거쳐 2014λ…„ 8μ›” 7일뢀터 β€˜μ§€λ°©μ„Έμ™Έμˆ˜μž…λ²•β€™μ΄ μ‹œν–‰λ˜μ—ˆλ‹€.'], 
['λͺ‡ 년도에 μ§€λ°©μ„Έμ™Έμˆ˜μž…λ²•μ΄ μ‹œν–‰λμ„κΉŒ?', 'μ‹ν’ˆμ˜μ•½ν’ˆμ•ˆμ „μ²˜λŠ” 21일 κ΅­λ‚΄ μ œμ•½κΈ°μ—… μœ λ°”μ΄μ˜€λ‘œμ§μŠ€κ°€ 개발 쀑인 μ‹ μ’… μ½”λ‘œλ‚˜λ°”μ΄λŸ¬μŠ€ 감염증(μ½”λ‘œλ‚˜19) λ°±μ‹  ν›„λ³΄λ¬Όμ§ˆ β€˜μœ μ½”λ°±-19β€™μ˜ μž„μƒμ‹œν—˜ κ³„νšμ„ μ§€λ‚œ 20일 μŠΉμΈν–ˆλ‹€κ³  λ°ν˜”λ‹€.'],  padding=True, truncation=True, return_tensors="pt")

model.eval()
with torch.no_grad():
    logits = model(**features).logits
    scores = torch.sigmoid(logits)
    print(scores)

Usage with SentenceTransformers

First install the Sentence Transformers library:

pip install -U sentence-transformers
from sentence_transformers import CrossEncoder

model = CrossEncoder('dragonkue/bge-reranker-v2-m3-ko', default_activation_function=torch.nn.Sigmoid())

scores = model.predict(['λͺ‡ 년도에 μ§€λ°©μ„Έμ™Έμˆ˜μž…λ²•μ΄ μ‹œν–‰λμ„κΉŒ?', 'μ‹€λ¬΄κ΅μœ‘μ„ 톡해 β€˜μ§€λ°©μ„Έμ™Έμˆ˜μž…λ²•β€™μ— λŒ€ν•œ μžμΉ˜λ‹¨μ²΄μ˜ 관심을 μ œκ³ ν•˜κ³  μžμΉ˜λ‹¨μ²΄μ˜ 차질 μ—†λŠ” 업무 좔진을 μ§€μ›ν•˜μ˜€λ‹€. μ΄λŸ¬ν•œ 쀀비과정을 거쳐 2014λ…„ 8μ›” 7일뢀터 β€˜μ§€λ°©μ„Έμ™Έμˆ˜μž…λ²•β€™μ΄ μ‹œν–‰λ˜μ—ˆλ‹€.'], 
['λͺ‡ 년도에 μ§€λ°©μ„Έμ™Έμˆ˜μž…λ²•μ΄ μ‹œν–‰λμ„κΉŒ?', 'μ‹ν’ˆμ˜μ•½ν’ˆμ•ˆμ „μ²˜λŠ” 21일 κ΅­λ‚΄ μ œμ•½κΈ°μ—… μœ λ°”μ΄μ˜€λ‘œμ§μŠ€κ°€ 개발 쀑인 μ‹ μ’… μ½”λ‘œλ‚˜λ°”μ΄λŸ¬μŠ€ 감염증(μ½”λ‘œλ‚˜19) λ°±μ‹  ν›„λ³΄λ¬Όμ§ˆ β€˜μœ μ½”λ°±-19β€™μ˜ μž„μƒμ‹œν—˜ κ³„νšμ„ μ§€λ‚œ 20일 μŠΉμΈν–ˆλ‹€κ³  λ°ν˜”λ‹€.'])
print(scores)

Usage with FlagEmbedding

First install the FlagEmbedding library:

pip install -U FlagEmbedding
from FlagEmbedding import FlagReranker

reranker = FlagReranker('dragonkue/bge-reranker-v2-m3-ko')

scores = reranker.compute_score([['λͺ‡ 년도에 μ§€λ°©μ„Έμ™Έμˆ˜μž…λ²•μ΄ μ‹œν–‰λμ„κΉŒ?', 'μ‹€λ¬΄κ΅μœ‘μ„ 톡해 β€˜μ§€λ°©μ„Έμ™Έμˆ˜μž…λ²•β€™μ— λŒ€ν•œ μžμΉ˜λ‹¨μ²΄μ˜ 관심을 μ œκ³ ν•˜κ³  μžμΉ˜λ‹¨μ²΄μ˜ 차질 μ—†λŠ” 업무 좔진을 μ§€μ›ν•˜μ˜€λ‹€. μ΄λŸ¬ν•œ 쀀비과정을 거쳐 2014λ…„ 8μ›” 7일뢀터 β€˜μ§€λ°©μ„Έμ™Έμˆ˜μž…λ²•β€™μ΄ μ‹œν–‰λ˜μ—ˆλ‹€.'], 
['λͺ‡ 년도에 μ§€λ°©μ„Έμ™Έμˆ˜μž…λ²•μ΄ μ‹œν–‰λμ„κΉŒ?', 'μ‹ν’ˆμ˜μ•½ν’ˆμ•ˆμ „μ²˜λŠ” 21일 κ΅­λ‚΄ μ œμ•½κΈ°μ—… μœ λ°”μ΄μ˜€λ‘œμ§μŠ€κ°€ 개발 쀑인 μ‹ μ’… μ½”λ‘œλ‚˜λ°”μ΄λŸ¬μŠ€ 감염증(μ½”λ‘œλ‚˜19) λ°±μ‹  ν›„λ³΄λ¬Όμ§ˆ β€˜μœ μ½”λ°±-19β€™μ˜ μž„μƒμ‹œν—˜ κ³„νšμ„ μ§€λ‚œ 20일 μŠΉμΈν–ˆλ‹€κ³  λ°ν˜”λ‹€.']], normalize=True)
print(scores)

Fine-tune

Refer to https://github.com/FlagOpen/FlagEmbedding

Evaluation

Metrics

  • ndcg, mrr, map metrics are metrics that consider ranking, while accuracy, precision, and recall are metrics that do not consider ranking. (Example: When considering ranking for retrieval top 10, different scores are given when the correct document is in 1st place and when it is in 10th place. However, accuracy, precision, and recall scores are the same if they are in the top 10.)

Bi-encoder and Cross-encoder

Bi-Encoders convert texts into fixed-size vectors and efficiently calculate similarities between them. They are fast and ideal for tasks like semantic search and classification, making them suitable for processing large datasets quickly.

Cross-Encoders directly compare pairs of texts to compute similarity scores, providing more accurate results. While they are slower due to needing to process each pair, they excel in re-ranking top results and are important in Advanced RAG techniques for enhancing text generation.

Korean Embedding Benchmark with AutoRAG

(https://github.com/Marker-Inc-Korea/AutoRAG-example-korean-embedding-benchmark)

This is a Korean embedding benchmark for the financial sector.

Top-k 1

Bi-Encoder (Sentence Transformer)

Model name F1 Recall Precision mAP mRR
paraphrase-multilingual-mpnet-base-v2 0.3596 0.3596 0.3596 0.3596 0.3596
KoSimCSE-roberta 0.4298 0.4298 0.4298 0.4298 0.4298
Cohere embed-multilingual-v3.0 0.3596 0.3596 0.3596 0.3596 0.3596
openai ada 002 0.4737 0.4737 0.4737 0.4737 0.4737
multilingual-e5-large-instruct 0.4649 0.4649 0.4649 0.4649 0.4649
Upstage Embedding 0.6579 0.6579 0.6579 0.6579 0.6579
paraphrase-multilingual-MiniLM-L12-v2 0.2982 0.2982 0.2982 0.2982 0.2982
openai_embed_3_small 0.5439 0.5439 0.5439 0.5439 0.5439
ko-sroberta-multitask 0.4211 0.4211 0.4211 0.4211 0.4211
openai_embed_3_large 0.6053 0.6053 0.6053 0.6053 0.6053
KU-HIAI-ONTHEIT-large-v1 0.7105 0.7105 0.7105 0.7105 0.7105
KU-HIAI-ONTHEIT-large-v1.1 0.7193 0.7193 0.7193 0.7193 0.7193
kf-deberta-multitask 0.4561 0.4561 0.4561 0.4561 0.4561
gte-multilingual-base 0.5877 0.5877 0.5877 0.5877 0.5877
BGE-m3 0.6578 0.6578 0.6578 0.6578 0.6578
bge-m3-korean 0.5351 0.5351 0.5351 0.5351 0.5351
BGE-m3-ko 0.7456 0.7456 0.7456 0.7456 0.7456

Cross-Encoder (Reranker)

Model name F1 Recall Precision mAP mRR
jinaai/jina-reranker-v2-base-multilingual 0.8070 0.8070 0.8070 0.8070 0.8070
Alibaba-NLP/gte-multilingual-reranker-base 0.7281 0.7281 0.7281 0.7281 0.7281
BAAI/bge-reranker-v2-m3 0.8772 0.8772 0.8772 0.8772 0.8772
bge-reranker-v2-m3-ko 0.9123 0.9123 0.9123 0.9123 0.9123

Top-k 3

Bi-Encoder (Sentence Transformer)

Model name F1 Recall Precision mAP mRR
paraphrase-multilingual-mpnet-base-v2 0.2368 0.4737 0.1579 0.2032 0.2032
KoSimCSE-roberta 0.3026 0.6053 0.2018 0.2661 0.2661
Cohere embed-multilingual-v3.0 0.2851 0.5702 0.1901 0.2515 0.2515
openai ada 002 0.3553 0.7105 0.2368 0.3202 0.3202
multilingual-e5-large-instruct 0.3333 0.6667 0.2222 0.2909 0.2909
Upstage Embedding 0.4211 0.8421 0.2807 0.3509 0.3509
paraphrase-multilingual-MiniLM-L12-v2 0.2061 0.4123 0.1374 0.1740 0.1740
openai_embed_3_small 0.3640 0.7281 0.2427 0.3026 0.3026
ko-sroberta-multitask 0.2939 0.5877 0.1959 0.2500 0.2500
openai_embed_3_large 0.3947 0.7895 0.2632 0.3348 0.3348
KU-HIAI-ONTHEIT-large-v1 0.4386 0.8772 0.2924 0.3421 0.3421
KU-HIAI-ONTHEIT-large-v1.1 0.4430 0.8860 0.2953 0.3406 0.3406
kf-deberta-multitask 0.3158 0.6316 0.2105 0.2792 0.2792
gte-multilingual-base 0.4035 0.8070 0.2690 0.3450 0.3450
BGE-m3 0.4254 0.8508 0.2836 0.3421 0.3421
bge-m3-korean 0.3684 0.7368 0.2456 0.3143 0.3143
BGE-m3-ko 0.4517 0.9035 0.3011 0.3494 0.3494

Cross-Encoder (Reranker)

Model name F1 Recall Precision mAP mRR
jinaai/jina-reranker-v2-base-multilingual 0.4649 0.9298 0.3099 0.8626 0.8626
Alibaba-NLP/gte-multilingual-reranker-base 0.4605 0.9211 0.3070 0.8173 0.8173
BAAI/bge-reranker-v2-m3 0.4781 0.9561 0.3187 0.9167 0.9167
bge-reranker-v2-m3-ko 0.4825 0.9649 0.3216 0.9371 0.9371

Top-k 5

Bi-Encoder (Sentence Transformer)

Model name F1 Recall Precision mAP mRR
paraphrase-multilingual-mpnet-base-v2 0.1813 0.5439 0.1088 0.1575 0.1575
KoSimCSE-roberta 0.2164 0.6491 0.1298 0.1751 0.1751
Cohere embed-multilingual-v3.0 0.2076 0.6228 0.1246 0.1640 0.1640
openai ada 002 0.2602 0.7807 0.1561 0.2139 0.2139
multilingual-e5-large-instruct 0.2544 0.7632 0.1526 0.2194 0.2194
Upstage Embedding 0.2982 0.8947 0.1789 0.2237 0.2237
paraphrase-multilingual-MiniLM-L12-v2 0.1637 0.4912 0.0982 0.1437 0.1437
openai_embed_3_small 0.2690 0.8070 0.1614 0.2148 0.2148
ko-sroberta-multitask 0.2164 0.6491 0.1298 0.1697 0.1697
openai_embed_3_large 0.2807 0.8421 0.1684 0.2088 0.2088
KU-HIAI-ONTHEIT-large-v1 0.3041 0.9123 0.1825 0.2137 0.2137
KU-HIAI-ONTHEIT-large-v1.1 0.3099 0.9298 0.1860 0.2148 0.2148
kf-deberta-multitask 0.2281 0.6842 0.1368 0.1724 0.1724
gte-multilingual-base 0.2865 0.8596 0.1719 0.2096 0.2096
BGE-m3 0.3041 0.9123 0.1825 0.2193 0.2193
bge-m3-korean 0.2661 0.7982 0.1596 0.2116 0.2116
BGE-m3-ko 0.3099 0.9298 0.1860 0.2098 0.2098

Cross-Encoder (Reranker)

Model name F1 Recall Precision mAP mRR
jinaai/jina-reranker-v2-base-multilingual 0.3129 0.9386 0.1877 0.8643 0.8643
Alibaba-NLP/gte-multilingual-reranker-base 0.3158 0.9474 0.1895 0.8234 0.8234
BAAI/bge-reranker-v2-m3 0.3216 0.9649 0.1930 0.9189 0.9189
bge-reranker-v2-m3-ko 0.3216 0.9649 0.1930 0.9371 0.9371