license: apache-2.0
language:
- ko
- en
metrics:
- accuracy
base_model:
- BAAI/bge-reranker-v2-m3
pipeline_tag: text-classification
library_name: sentence-transformers
Reranker (Cross-Encoder)
Different from embedding model, reranker uses question and document as input and directly output similarity instead of embedding. You can get a relevance score by inputting query and passage to the reranker. And the score can be mapped to a float value in [0,1] by sigmoid function.
Model Details
- Base model : BAAI/bge-reranker-v2-m3
- The multilingual model has been optimized for Korean.
Usage with Transformers
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
model = AutoModelForSequenceClassification.from_pretrained('dragonkue/bge-reranker-v2-m3-ko')
tokenizer = AutoTokenizer.from_pretrained('dragonkue/bge-reranker-v2-m3-ko')
features = tokenizer(['λͺ λ
λμ μ§λ°©μΈμΈμμ
λ²μ΄ μνλμκΉ?', 'μ€λ¬΄κ΅μ‘μ ν΅ν΄ βμ§λ°©μΈμΈμμ
λ²βμ λν μμΉλ¨μ²΄μ κ΄μ¬μ μ κ³ νκ³ μμΉλ¨μ²΄μ μ°¨μ§ μλ μ
무 μΆμ§μ μ§μνμλ€. μ΄λ¬ν μ€λΉκ³Όμ μ κ±°μ³ 2014λ
8μ 7μΌλΆν° βμ§λ°©μΈμΈμμ
λ²βμ΄ μνλμλ€.'],
['λͺ λ
λμ μ§λ°©μΈμΈμμ
λ²μ΄ μνλμκΉ?', 'μνμμ½νμμ μ²λ 21μΌ κ΅λ΄ μ μ½κΈ°μ
μ λ°μ΄μ€λ‘μ§μ€κ° κ°λ° μ€μΈ μ μ’
μ½λ‘λλ°μ΄λ¬μ€ κ°μΌμ¦(μ½λ‘λ19) λ°±μ νλ³΄λ¬Όμ§ βμ μ½λ°±-19βμ μμμν κ³νμ μ§λ 20μΌ μΉμΈνλ€κ³ λ°νλ€.'], padding=True, truncation=True, return_tensors="pt")
model.eval()
with torch.no_grad():
logits = model(**features).logits
scores = torch.sigmoid(logits)
print(scores)
Usage with SentenceTransformers
First install the Sentence Transformers library:
pip install -U sentence-transformers
from sentence_transformers import CrossEncoder
model = CrossEncoder('dragonkue/bge-reranker-v2-m3-ko', default_activation_function=torch.nn.Sigmoid())
scores = model.predict(['λͺ λ
λμ μ§λ°©μΈμΈμμ
λ²μ΄ μνλμκΉ?', 'μ€λ¬΄κ΅μ‘μ ν΅ν΄ βμ§λ°©μΈμΈμμ
λ²βμ λν μμΉλ¨μ²΄μ κ΄μ¬μ μ κ³ νκ³ μμΉλ¨μ²΄μ μ°¨μ§ μλ μ
무 μΆμ§μ μ§μνμλ€. μ΄λ¬ν μ€λΉκ³Όμ μ κ±°μ³ 2014λ
8μ 7μΌλΆν° βμ§λ°©μΈμΈμμ
λ²βμ΄ μνλμλ€.'],
['λͺ λ
λμ μ§λ°©μΈμΈμμ
λ²μ΄ μνλμκΉ?', 'μνμμ½νμμ μ²λ 21μΌ κ΅λ΄ μ μ½κΈ°μ
μ λ°μ΄μ€λ‘μ§μ€κ° κ°λ° μ€μΈ μ μ’
μ½λ‘λλ°μ΄λ¬μ€ κ°μΌμ¦(μ½λ‘λ19) λ°±μ νλ³΄λ¬Όμ§ βμ μ½λ°±-19βμ μμμν κ³νμ μ§λ 20μΌ μΉμΈνλ€κ³ λ°νλ€.'])
print(scores)
Usage with FlagEmbedding
First install the FlagEmbedding library:
pip install -U FlagEmbedding
from FlagEmbedding import FlagReranker
reranker = FlagReranker('dragonkue/bge-reranker-v2-m3-ko')
scores = reranker.compute_score([['λͺ λ
λμ μ§λ°©μΈμΈμμ
λ²μ΄ μνλμκΉ?', 'μ€λ¬΄κ΅μ‘μ ν΅ν΄ βμ§λ°©μΈμΈμμ
λ²βμ λν μμΉλ¨μ²΄μ κ΄μ¬μ μ κ³ νκ³ μμΉλ¨μ²΄μ μ°¨μ§ μλ μ
무 μΆμ§μ μ§μνμλ€. μ΄λ¬ν μ€λΉκ³Όμ μ κ±°μ³ 2014λ
8μ 7μΌλΆν° βμ§λ°©μΈμΈμμ
λ²βμ΄ μνλμλ€.'],
['λͺ λ
λμ μ§λ°©μΈμΈμμ
λ²μ΄ μνλμκΉ?', 'μνμμ½νμμ μ²λ 21μΌ κ΅λ΄ μ μ½κΈ°μ
μ λ°μ΄μ€λ‘μ§μ€κ° κ°λ° μ€μΈ μ μ’
μ½λ‘λλ°μ΄λ¬μ€ κ°μΌμ¦(μ½λ‘λ19) λ°±μ νλ³΄λ¬Όμ§ βμ μ½λ°±-19βμ μμμν κ³νμ μ§λ 20μΌ μΉμΈνλ€κ³ λ°νλ€.']], normalize=True)
print(scores)
Fine-tune
Refer to https://github.com/FlagOpen/FlagEmbedding
Evaluation
Metrics
- ndcg, mrr, map metrics are metrics that consider ranking, while accuracy, precision, and recall are metrics that do not consider ranking. (Example: When considering ranking for retrieval top 10, different scores are given when the correct document is in 1st place and when it is in 10th place. However, accuracy, precision, and recall scores are the same if they are in the top 10.)
Bi-encoder and Cross-encoder
Bi-Encoders convert texts into fixed-size vectors and efficiently calculate similarities between them. They are fast and ideal for tasks like semantic search and classification, making them suitable for processing large datasets quickly.
Cross-Encoders directly compare pairs of texts to compute similarity scores, providing more accurate results. While they are slower due to needing to process each pair, they excel in re-ranking top results and are important in Advanced RAG techniques for enhancing text generation.
Korean Embedding Benchmark with AutoRAG
(https://github.com/Marker-Inc-Korea/AutoRAG-example-korean-embedding-benchmark)
This is a Korean embedding benchmark for the financial sector.
Top-k 1
Bi-Encoder (Sentence Transformer)
Model name | F1 | Recall | Precision | mAP | mRR |
---|---|---|---|---|---|
paraphrase-multilingual-mpnet-base-v2 | 0.3596 | 0.3596 | 0.3596 | 0.3596 | 0.3596 |
KoSimCSE-roberta | 0.4298 | 0.4298 | 0.4298 | 0.4298 | 0.4298 |
Cohere embed-multilingual-v3.0 | 0.3596 | 0.3596 | 0.3596 | 0.3596 | 0.3596 |
openai ada 002 | 0.4737 | 0.4737 | 0.4737 | 0.4737 | 0.4737 |
multilingual-e5-large-instruct | 0.4649 | 0.4649 | 0.4649 | 0.4649 | 0.4649 |
Upstage Embedding | 0.6579 | 0.6579 | 0.6579 | 0.6579 | 0.6579 |
paraphrase-multilingual-MiniLM-L12-v2 | 0.2982 | 0.2982 | 0.2982 | 0.2982 | 0.2982 |
openai_embed_3_small | 0.5439 | 0.5439 | 0.5439 | 0.5439 | 0.5439 |
ko-sroberta-multitask | 0.4211 | 0.4211 | 0.4211 | 0.4211 | 0.4211 |
openai_embed_3_large | 0.6053 | 0.6053 | 0.6053 | 0.6053 | 0.6053 |
KU-HIAI-ONTHEIT-large-v1 | 0.7105 | 0.7105 | 0.7105 | 0.7105 | 0.7105 |
KU-HIAI-ONTHEIT-large-v1.1 | 0.7193 | 0.7193 | 0.7193 | 0.7193 | 0.7193 |
kf-deberta-multitask | 0.4561 | 0.4561 | 0.4561 | 0.4561 | 0.4561 |
gte-multilingual-base | 0.5877 | 0.5877 | 0.5877 | 0.5877 | 0.5877 |
BGE-m3 | 0.6578 | 0.6578 | 0.6578 | 0.6578 | 0.6578 |
bge-m3-korean | 0.5351 | 0.5351 | 0.5351 | 0.5351 | 0.5351 |
BGE-m3-ko | 0.7456 | 0.7456 | 0.7456 | 0.7456 | 0.7456 |
Cross-Encoder (Reranker)
Model name | F1 | Recall | Precision | mAP | mRR |
---|---|---|---|---|---|
jinaai/jina-reranker-v2-base-multilingual | 0.8070 | 0.8070 | 0.8070 | 0.8070 | 0.8070 |
Alibaba-NLP/gte-multilingual-reranker-base | 0.7281 | 0.7281 | 0.7281 | 0.7281 | 0.7281 |
BAAI/bge-reranker-v2-m3 | 0.8772 | 0.8772 | 0.8772 | 0.8772 | 0.8772 |
bge-reranker-v2-m3-ko | 0.9123 | 0.9123 | 0.9123 | 0.9123 | 0.9123 |
Top-k 3
Bi-Encoder (Sentence Transformer)
Model name | F1 | Recall | Precision | mAP | mRR |
---|---|---|---|---|---|
paraphrase-multilingual-mpnet-base-v2 | 0.2368 | 0.4737 | 0.1579 | 0.2032 | 0.2032 |
KoSimCSE-roberta | 0.3026 | 0.6053 | 0.2018 | 0.2661 | 0.2661 |
Cohere embed-multilingual-v3.0 | 0.2851 | 0.5702 | 0.1901 | 0.2515 | 0.2515 |
openai ada 002 | 0.3553 | 0.7105 | 0.2368 | 0.3202 | 0.3202 |
multilingual-e5-large-instruct | 0.3333 | 0.6667 | 0.2222 | 0.2909 | 0.2909 |
Upstage Embedding | 0.4211 | 0.8421 | 0.2807 | 0.3509 | 0.3509 |
paraphrase-multilingual-MiniLM-L12-v2 | 0.2061 | 0.4123 | 0.1374 | 0.1740 | 0.1740 |
openai_embed_3_small | 0.3640 | 0.7281 | 0.2427 | 0.3026 | 0.3026 |
ko-sroberta-multitask | 0.2939 | 0.5877 | 0.1959 | 0.2500 | 0.2500 |
openai_embed_3_large | 0.3947 | 0.7895 | 0.2632 | 0.3348 | 0.3348 |
KU-HIAI-ONTHEIT-large-v1 | 0.4386 | 0.8772 | 0.2924 | 0.3421 | 0.3421 |
KU-HIAI-ONTHEIT-large-v1.1 | 0.4430 | 0.8860 | 0.2953 | 0.3406 | 0.3406 |
kf-deberta-multitask | 0.3158 | 0.6316 | 0.2105 | 0.2792 | 0.2792 |
gte-multilingual-base | 0.4035 | 0.8070 | 0.2690 | 0.3450 | 0.3450 |
BGE-m3 | 0.4254 | 0.8508 | 0.2836 | 0.3421 | 0.3421 |
bge-m3-korean | 0.3684 | 0.7368 | 0.2456 | 0.3143 | 0.3143 |
BGE-m3-ko | 0.4517 | 0.9035 | 0.3011 | 0.3494 | 0.3494 |
Cross-Encoder (Reranker)
Model name | F1 | Recall | Precision | mAP | mRR |
---|---|---|---|---|---|
jinaai/jina-reranker-v2-base-multilingual | 0.4649 | 0.9298 | 0.3099 | 0.8626 | 0.8626 |
Alibaba-NLP/gte-multilingual-reranker-base | 0.4605 | 0.9211 | 0.3070 | 0.8173 | 0.8173 |
BAAI/bge-reranker-v2-m3 | 0.4781 | 0.9561 | 0.3187 | 0.9167 | 0.9167 |
bge-reranker-v2-m3-ko | 0.4825 | 0.9649 | 0.3216 | 0.9371 | 0.9371 |
Top-k 5
Bi-Encoder (Sentence Transformer)
Model name | F1 | Recall | Precision | mAP | mRR |
---|---|---|---|---|---|
paraphrase-multilingual-mpnet-base-v2 | 0.1813 | 0.5439 | 0.1088 | 0.1575 | 0.1575 |
KoSimCSE-roberta | 0.2164 | 0.6491 | 0.1298 | 0.1751 | 0.1751 |
Cohere embed-multilingual-v3.0 | 0.2076 | 0.6228 | 0.1246 | 0.1640 | 0.1640 |
openai ada 002 | 0.2602 | 0.7807 | 0.1561 | 0.2139 | 0.2139 |
multilingual-e5-large-instruct | 0.2544 | 0.7632 | 0.1526 | 0.2194 | 0.2194 |
Upstage Embedding | 0.2982 | 0.8947 | 0.1789 | 0.2237 | 0.2237 |
paraphrase-multilingual-MiniLM-L12-v2 | 0.1637 | 0.4912 | 0.0982 | 0.1437 | 0.1437 |
openai_embed_3_small | 0.2690 | 0.8070 | 0.1614 | 0.2148 | 0.2148 |
ko-sroberta-multitask | 0.2164 | 0.6491 | 0.1298 | 0.1697 | 0.1697 |
openai_embed_3_large | 0.2807 | 0.8421 | 0.1684 | 0.2088 | 0.2088 |
KU-HIAI-ONTHEIT-large-v1 | 0.3041 | 0.9123 | 0.1825 | 0.2137 | 0.2137 |
KU-HIAI-ONTHEIT-large-v1.1 | 0.3099 | 0.9298 | 0.1860 | 0.2148 | 0.2148 |
kf-deberta-multitask | 0.2281 | 0.6842 | 0.1368 | 0.1724 | 0.1724 |
gte-multilingual-base | 0.2865 | 0.8596 | 0.1719 | 0.2096 | 0.2096 |
BGE-m3 | 0.3041 | 0.9123 | 0.1825 | 0.2193 | 0.2193 |
bge-m3-korean | 0.2661 | 0.7982 | 0.1596 | 0.2116 | 0.2116 |
BGE-m3-ko | 0.3099 | 0.9298 | 0.1860 | 0.2098 | 0.2098 |
Cross-Encoder (Reranker)
Model name | F1 | Recall | Precision | mAP | mRR |
---|---|---|---|---|---|
jinaai/jina-reranker-v2-base-multilingual | 0.3129 | 0.9386 | 0.1877 | 0.8643 | 0.8643 |
Alibaba-NLP/gte-multilingual-reranker-base | 0.3158 | 0.9474 | 0.1895 | 0.8234 | 0.8234 |
BAAI/bge-reranker-v2-m3 | 0.3216 | 0.9649 | 0.1930 | 0.9189 | 0.9189 |
bge-reranker-v2-m3-ko | 0.3216 | 0.9649 | 0.1930 | 0.9371 | 0.9371 |