Update README.md
Browse files
README.md
CHANGED
@@ -42,7 +42,7 @@ The python package currently provides six QA evaluation methods.
|
|
42 |
- Question/Answer Type Evaluation and Transformer Neural evaluations are cost free and suitable for short-form and longer-form QA datasets. They have higher correlation with human judgments than exact match and F1 score when the length of the gold and candidate answers become long.
|
43 |
- Black-box LLM evaluations are closest to human evaluations, and they are not cost-free.
|
44 |
|
45 |
-
|
46 |
#### `em_match`
|
47 |
|
48 |
Returns a boolean indicating whether there are any exact normalized matches between gold and candidate answers.
|
@@ -68,7 +68,7 @@ Exact Match: False
|
|
68 |
'''
|
69 |
```
|
70 |
|
71 |
-
|
72 |
#### `f1_score_with_precision_recall`
|
73 |
|
74 |
Calculates F1 score, precision, and recall between a reference and a candidate answer.
|
@@ -98,7 +98,7 @@ F1 Match: False
|
|
98 |
'''
|
99 |
```
|
100 |
|
101 |
-
|
102 |
Our fine-tuned BERT model is on π€ [Huggingface](https://huggingface.co/Zongxia/answer_equivalence_bert?text=The+goal+of+life+is+%5BMASK%5D.). Our Package also supports downloading and matching directly. [distilroberta](https://huggingface.co/Zongxia/answer_equivalence_distilroberta), [distilbert](https://huggingface.co/Zongxia/answer_equivalence_distilbert), [roberta](https://huggingface.co/Zongxia/answer_equivalence_roberta), and [roberta-large](https://huggingface.co/Zongxia/answer_equivalence_roberta-large) are also supported now! π₯π₯π₯
|
103 |
|
104 |
#### `transformer_match`
|
@@ -128,7 +128,7 @@ Score: {'The Frog Prince': {'The movie "The Princess and the Frog" is loosely ba
|
|
128 |
'''
|
129 |
```
|
130 |
|
131 |
-
|
132 |
#### 1. `get_highest_score`
|
133 |
|
134 |
Returns the gold answer and candidate answer pair that has the highest matching score. This function is useful for evaluating the closest match to a given candidate response based on a list of reference answers.
|
@@ -196,7 +196,7 @@ print(pedant.get_score(reference_answer[1], candidate_answer, question))
|
|
196 |
```
|
197 |
|
198 |
|
199 |
-
|
200 |
|
201 |
Note: The prompting function can be used for any prompting purposes.
|
202 |
|
|
|
42 |
- Question/Answer Type Evaluation and Transformer Neural evaluations are cost free and suitable for short-form and longer-form QA datasets. They have higher correlation with human judgments than exact match and F1 score when the length of the gold and candidate answers become long.
|
43 |
- Black-box LLM evaluations are closest to human evaluations, and they are not cost-free.
|
44 |
|
45 |
+
## Normalized Exact Match
|
46 |
#### `em_match`
|
47 |
|
48 |
Returns a boolean indicating whether there are any exact normalized matches between gold and candidate answers.
|
|
|
68 |
'''
|
69 |
```
|
70 |
|
71 |
+
## F1 Score
|
72 |
#### `f1_score_with_precision_recall`
|
73 |
|
74 |
Calculates F1 score, precision, and recall between a reference and a candidate answer.
|
|
|
98 |
'''
|
99 |
```
|
100 |
|
101 |
+
## Transformer Neural Evaluation
|
102 |
Our fine-tuned BERT model is on π€ [Huggingface](https://huggingface.co/Zongxia/answer_equivalence_bert?text=The+goal+of+life+is+%5BMASK%5D.). Our Package also supports downloading and matching directly. [distilroberta](https://huggingface.co/Zongxia/answer_equivalence_distilroberta), [distilbert](https://huggingface.co/Zongxia/answer_equivalence_distilbert), [roberta](https://huggingface.co/Zongxia/answer_equivalence_roberta), and [roberta-large](https://huggingface.co/Zongxia/answer_equivalence_roberta-large) are also supported now! π₯π₯π₯
|
103 |
|
104 |
#### `transformer_match`
|
|
|
128 |
'''
|
129 |
```
|
130 |
|
131 |
+
## Efficient and Robust Question/Answer Type Evaluation
|
132 |
#### 1. `get_highest_score`
|
133 |
|
134 |
Returns the gold answer and candidate answer pair that has the highest matching score. This function is useful for evaluating the closest match to a given candidate response based on a list of reference answers.
|
|
|
196 |
```
|
197 |
|
198 |
|
199 |
+
## Prompting LLM For Evaluation
|
200 |
|
201 |
Note: The prompting function can be used for any prompting purposes.
|
202 |
|