Update README.md
Browse files
README.md
CHANGED
@@ -281,7 +281,7 @@ You can finetune this model on your own dataset.
|
|
281 |
## Evaluation
|
282 |
|
283 |
### Metrics
|
284 |
-
- ndcg, mrr, map metrics are metrics that consider ranking, while accuracy, precision, and recall are metrics that do not consider ranking. (Example: When considering ranking for retrieval top 10, different scores are given when the correct document is in 1st place and when it is in 10th place. However, accuracy, precision, and recall scores are the same if they are in the top 10.
|
285 |
|
286 |
#### Information Retrieval
|
287 |
* Korean Embedding Benchmark is a benchmark with a relatively long 3/4 quantile of string length of 1024
|
@@ -417,8 +417,8 @@ This is a benchmark of Korean embedding models.
|
|
417 |
|
418 |
## Bias, Risks and Limitations
|
419 |
|
420 |
-
Since the evaluation results are different for each domain, it is necessary to compare and evaluate the model in your own domain. In the Miracl benchmark, the evaluation was conducted using the Korean Wikipedia as a corpus, and in this case, the cosine_ndcg@10 score dropped by 0.2 points after learning. However, in the Auto-RAG benchmark, which is a financial domain, the ndcg score increased by 0.9 when it was top 1. This model may be advantageous for use in a specific domain.
|
421 |
-
|
422 |
|
423 |
|
424 |
### Training Hyperparameters
|
|
|
281 |
## Evaluation
|
282 |
|
283 |
### Metrics
|
284 |
+
- ndcg, mrr, map metrics are metrics that consider ranking, while accuracy, precision, and recall are metrics that do not consider ranking. (Example: When considering ranking for retrieval top 10, different scores are given when the correct document is in 1st place and when it is in 10th place. However, accuracy, precision, and recall scores are the same if they are in the top 10.)
|
285 |
|
286 |
#### Information Retrieval
|
287 |
* Korean Embedding Benchmark is a benchmark with a relatively long 3/4 quantile of string length of 1024
|
|
|
417 |
|
418 |
## Bias, Risks and Limitations
|
419 |
|
420 |
+
- Since the evaluation results are different for each domain, it is necessary to compare and evaluate the model in your own domain. In the Miracl benchmark, the evaluation was conducted using the Korean Wikipedia as a corpus, and in this case, the cosine_ndcg@10 score dropped by 0.2 points after learning. However, in the Auto-RAG benchmark, which is a financial domain, the ndcg score increased by 0.9 when it was top 1. This model may be advantageous for use in a specific domain.
|
421 |
+
- Also, since the miracl benchmark consists of a corpus of relatively short strings, while the Korean Embedding Benchmark consists of a corpus of longer strings, this model may be more advantageous if the length of the corpus you want to use is long.
|
422 |
|
423 |
|
424 |
### Training Hyperparameters
|