Zongxia commited on
Commit
8017442
•
1 Parent(s): e83261c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +95 -0
README.md CHANGED
@@ -1,3 +1,98 @@
1
  ---
 
2
  license: mit
 
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ inference: false
3
  license: mit
4
+ language:
5
+ - en
6
+ metrics:
7
+ - exact_match
8
+ - f1
9
+ - bertscore
10
+ pipeline_tag: text-classification
11
  ---
12
+ # QA-Evaluation-Metrics
13
+
14
+ [![PyPI version qa-metrics](https://img.shields.io/pypi/v/qa-metrics.svg)](https://pypi.org/project/qa-metrics/)
15
+
16
+
17
+ QA-Evaluation-Metrics is a fast and lightweight Python package for evaluating question-answering models. It provides various basic metrics to assess the performance of QA models. Check out our **CFMatcher**, a matching method going beyond token-level matching and is more efficient than LLM matchings but still retains competitive evaluation performance of transformer LLM models.
18
+
19
+ If you find this repo avialable, please cite our paper:
20
+ ```bibtex
21
+ @misc{li2024cfmatch,
22
+ title={CFMatch: Aligning Automated Answer Equivalence Evaluation with Expert Judgments For Open-Domain Question Answering},
23
+ author={Zongxia Li and Ishani Mondal and Yijun Liang and Huy Nghiem and Jordan Boyd-Graber},
24
+ year={2024},
25
+ eprint={2401.13170},
26
+ archivePrefix={arXiv},
27
+ primaryClass={cs.CL}
28
+ }
29
+ ```
30
+
31
+ ## Installation
32
+
33
+ To install the package, run the following command:
34
+
35
+ ```bash
36
+ pip install qa-metrics
37
+ ```
38
+
39
+ ## Usage
40
+
41
+ The python package currently provides four QA evaluation metrics.
42
+
43
+ #### Exact Match
44
+ ```python
45
+ from qa_metrics.em import em_match
46
+
47
+ reference_answer = ["Charles , Prince of Wales"]
48
+ candidate_answer = "Prince Charles"
49
+ match_result = em_match(reference_answer, candidate_answer)
50
+ print("Exact Match: ", match_result)
51
+ ```
52
+
53
+ #### Transformer Match
54
+ Our fine-tuned BERT model is this repository. Our Package also supports downloading and matching directly. More Matching transformer models will be available 🔥🔥🔥
55
+
56
+ ```python
57
+ from qa_metrics.transformerMatcher import TransformerMatcher
58
+
59
+ question = "who will take the throne after the queen dies"
60
+ tm = TransformerMatcher("bert")
61
+ scores = tm.get_scores(reference_answer, candidate_answer, question)
62
+ match_result = tm.transformer_match(reference_answer, candidate_answer, question)
63
+ print("Score: %s; CF Match: %s" % (scores, match_result))
64
+ ```
65
+
66
+ #### F1 Score
67
+ ```python
68
+ from qa_metrics.f1 import f1_match,f1_score_with_precision_recall
69
+
70
+ f1_stats = f1_score_with_precision_recall(reference_answer[0], candidate_answer)
71
+ print("F1 stats: ", f1_stats)
72
+
73
+ match_result = f1_match(reference_answer, candidate_answer, threshold=0.5)
74
+ print("F1 Match: ", match_result)
75
+ ```
76
+
77
+ #### CFMatch
78
+ ```python
79
+ from qa_metrics.cfm import CFMatcher
80
+
81
+ question = "who will take the throne after the queen dies"
82
+ cfm = CFMatcher()
83
+ scores = cfm.get_scores(reference_answer, candidate_answer, question)
84
+ match_result = cfm.cf_match(reference_answer, candidate_answer, question)
85
+ print("Score: %s; CF Match: %s" % (scores, match_result))
86
+ ```
87
+
88
+ ## Updates
89
+ - [01/24/24] 🔥 The full paper is uploaded and can be accessed [here]([https://arxiv.org/abs/2310.14566](https://arxiv.org/abs/2401.13170)). The dataset is expanded and leaderboard is updated.
90
+ - Our Training Dataset is adapted and augmented from [Bulian et al](https://github.com/google-research-datasets/answer-equivalence-dataset). Our [dataset repo](https://github.com/zli12321/Answer_Equivalence_Dataset.git) includes the augmented training set and QA evaluation testing sets discussed in our paper.
91
+
92
+ ## License
93
+
94
+ This project is licensed under the [MIT License](LICENSE.md) - see the LICENSE file for details.
95
+
96
+ ## Contact
97
+
98
+ For any additional questions or comments, please contact [[email protected]].