dragonkue commited on
Commit
aa3f81c
1 Parent(s): ce6cb14

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -3
README.md CHANGED
@@ -281,7 +281,7 @@ You can finetune this model on your own dataset.
281
  ## Evaluation
282
 
283
  ### Metrics
284
- - ndcg, mrr, map metrics are metrics that consider ranking, while accuracy, precision, and recall are metrics that do not consider ranking. (Example: When considering ranking for retrieval top 10, different scores are given when the correct document is in 1st place and when it is in 10th place. However, accuracy, precision, and recall scores are the same if they are in the top 10.
285
 
286
  #### Information Retrieval
287
  * Korean Embedding Benchmark is a benchmark with a relatively long 3/4 quantile of string length of 1024
@@ -417,8 +417,8 @@ This is a benchmark of Korean embedding models.
417
 
418
  ## Bias, Risks and Limitations
419
 
420
- Since the evaluation results are different for each domain, it is necessary to compare and evaluate the model in your own domain. In the Miracl benchmark, the evaluation was conducted using the Korean Wikipedia as a corpus, and in this case, the cosine_ndcg@10 score dropped by 0.2 points after learning. However, in the Auto-RAG benchmark, which is a financial domain, the ndcg score increased by 0.9 when it was top 1. This model may be advantageous for use in a specific domain.
421
-
422
 
423
 
424
  ### Training Hyperparameters
 
281
  ## Evaluation
282
 
283
  ### Metrics
284
+ - ndcg, mrr, map metrics are metrics that consider ranking, while accuracy, precision, and recall are metrics that do not consider ranking. (Example: When considering ranking for retrieval top 10, different scores are given when the correct document is in 1st place and when it is in 10th place. However, accuracy, precision, and recall scores are the same if they are in the top 10.)
285
 
286
  #### Information Retrieval
287
  * Korean Embedding Benchmark is a benchmark with a relatively long 3/4 quantile of string length of 1024
 
417
 
418
  ## Bias, Risks and Limitations
419
 
420
+ - Since the evaluation results are different for each domain, it is necessary to compare and evaluate the model in your own domain. In the Miracl benchmark, the evaluation was conducted using the Korean Wikipedia as a corpus, and in this case, the cosine_ndcg@10 score dropped by 0.2 points after learning. However, in the Auto-RAG benchmark, which is a financial domain, the ndcg score increased by 0.9 when it was top 1. This model may be advantageous for use in a specific domain.
421
+ - Also, since the miracl benchmark consists of a corpus of relatively short strings, while the Korean Embedding Benchmark consists of a corpus of longer strings, this model may be more advantageous if the length of the corpus you want to use is long.
422
 
423
 
424
  ### Training Hyperparameters