Zongxia commited on
Commit
f207550
β€’
1 Parent(s): 87c1d8e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +5 -5
README.md CHANGED
@@ -42,7 +42,7 @@ The python package currently provides six QA evaluation methods.
42
  - Question/Answer Type Evaluation and Transformer Neural evaluations are cost free and suitable for short-form and longer-form QA datasets. They have higher correlation with human judgments than exact match and F1 score when the length of the gold and candidate answers become long.
43
  - Black-box LLM evaluations are closest to human evaluations, and they are not cost-free.
44
 
45
- ### Normalized Exact Match
46
  #### `em_match`
47
 
48
  Returns a boolean indicating whether there are any exact normalized matches between gold and candidate answers.
@@ -68,7 +68,7 @@ Exact Match: False
68
  '''
69
  ```
70
 
71
- ### F1 Score
72
  #### `f1_score_with_precision_recall`
73
 
74
  Calculates F1 score, precision, and recall between a reference and a candidate answer.
@@ -98,7 +98,7 @@ F1 Match: False
98
  '''
99
  ```
100
 
101
- ### Transformer Neural Evaluation
102
  Our fine-tuned BERT model is on πŸ€— [Huggingface](https://huggingface.co/Zongxia/answer_equivalence_bert?text=The+goal+of+life+is+%5BMASK%5D.). Our Package also supports downloading and matching directly. [distilroberta](https://huggingface.co/Zongxia/answer_equivalence_distilroberta), [distilbert](https://huggingface.co/Zongxia/answer_equivalence_distilbert), [roberta](https://huggingface.co/Zongxia/answer_equivalence_roberta), and [roberta-large](https://huggingface.co/Zongxia/answer_equivalence_roberta-large) are also supported now! πŸ”₯πŸ”₯πŸ”₯
103
 
104
  #### `transformer_match`
@@ -128,7 +128,7 @@ Score: {'The Frog Prince': {'The movie "The Princess and the Frog" is loosely ba
128
  '''
129
  ```
130
 
131
- ### Efficient and Robust Question/Answer Type Evaluation
132
  #### 1. `get_highest_score`
133
 
134
  Returns the gold answer and candidate answer pair that has the highest matching score. This function is useful for evaluating the closest match to a given candidate response based on a list of reference answers.
@@ -196,7 +196,7 @@ print(pedant.get_score(reference_answer[1], candidate_answer, question))
196
  ```
197
 
198
 
199
- #### Prompting LLM For Evaluation
200
 
201
  Note: The prompting function can be used for any prompting purposes.
202
 
 
42
  - Question/Answer Type Evaluation and Transformer Neural evaluations are cost free and suitable for short-form and longer-form QA datasets. They have higher correlation with human judgments than exact match and F1 score when the length of the gold and candidate answers become long.
43
  - Black-box LLM evaluations are closest to human evaluations, and they are not cost-free.
44
 
45
+ ## Normalized Exact Match
46
  #### `em_match`
47
 
48
  Returns a boolean indicating whether there are any exact normalized matches between gold and candidate answers.
 
68
  '''
69
  ```
70
 
71
+ ## F1 Score
72
  #### `f1_score_with_precision_recall`
73
 
74
  Calculates F1 score, precision, and recall between a reference and a candidate answer.
 
98
  '''
99
  ```
100
 
101
+ ## Transformer Neural Evaluation
102
  Our fine-tuned BERT model is on πŸ€— [Huggingface](https://huggingface.co/Zongxia/answer_equivalence_bert?text=The+goal+of+life+is+%5BMASK%5D.). Our Package also supports downloading and matching directly. [distilroberta](https://huggingface.co/Zongxia/answer_equivalence_distilroberta), [distilbert](https://huggingface.co/Zongxia/answer_equivalence_distilbert), [roberta](https://huggingface.co/Zongxia/answer_equivalence_roberta), and [roberta-large](https://huggingface.co/Zongxia/answer_equivalence_roberta-large) are also supported now! πŸ”₯πŸ”₯πŸ”₯
103
 
104
  #### `transformer_match`
 
128
  '''
129
  ```
130
 
131
+ ## Efficient and Robust Question/Answer Type Evaluation
132
  #### 1. `get_highest_score`
133
 
134
  Returns the gold answer and candidate answer pair that has the highest matching score. This function is useful for evaluating the closest match to a given candidate response based on a list of reference answers.
 
196
  ```
197
 
198
 
199
+ ## Prompting LLM For Evaluation
200
 
201
  Note: The prompting function can be used for any prompting purposes.
202