Spaces:
Runtime error
Runtime error
title: Exact Match | |
emoji: 🤗 | |
colorFrom: blue | |
colorTo: red | |
sdk: gradio | |
sdk_version: 3.19.1 | |
app_file: app.py | |
pinned: false | |
tags: | |
- evaluate | |
- metric | |
description: >- | |
Returns the rate at which the input predicted strings exactly match their references, ignoring any strings input as part of the regexes_to_ignore list. | |
# Metric Card for Exact Match | |
## Metric Description | |
A given predicted string's exact match score is 1 if it is the exact same as its reference string, and is 0 otherwise. | |
- **Example 1**: The exact match score of prediction "Happy Birthday!" is 0, given its reference is "Happy New Year!". | |
- **Example 2**: The exact match score of prediction "The Colour of Magic (1983)" is 1, given its reference is also "The Colour of Magic (1983)". | |
The exact match score of a set of predictions is the sum of all of the individual exact match scores in the set, divided by the total number of predictions in the set. | |
- **Example**: The exact match score of the set {Example 1, Example 2} (above) is 0.5. | |
## How to Use | |
At minimum, this metric takes as input predictions and references: | |
```python | |
>>> from evaluate import load | |
>>> exact_match_metric = load("exact_match") | |
>>> results = exact_match_metric.compute(predictions=predictions, references=references) | |
``` | |
### Inputs | |
- **`predictions`** (`list` of `str`): List of predicted texts. | |
- **`references`** (`list` of `str`): List of reference texts. | |
- **`regexes_to_ignore`** (`list` of `str`): Regex expressions of characters to ignore when calculating the exact matches. Defaults to `None`. Note: the regex changes are applied before capitalization is normalized. | |
- **`ignore_case`** (`bool`): If `True`, turns everything to lowercase so that capitalization differences are ignored. Defaults to `False`. | |
- **`ignore_punctuation`** (`bool`): If `True`, removes punctuation before comparing strings. Defaults to `False`. | |
- **`ignore_numbers`** (`bool`): If `True`, removes all digits before comparing strings. Defaults to `False`. | |
### Output Values | |
This metric outputs a dictionary with one value: the average exact match score. | |
```python | |
{'exact_match': 1.0} | |
``` | |
This metric's range is 0-1, inclusive. Here, 0.0 means no prediction/reference pairs were matches, while 1.0 means they all were. | |
#### Values from Popular Papers | |
The exact match metric is often included in other metrics, such as SQuAD. For example, the [original SQuAD paper](https://nlp.stanford.edu/pubs/rajpurkar2016squad.pdf) reported an Exact Match score of 40.0%. They also report that the human performance Exact Match score on the dataset was 80.3%. | |
### Examples | |
Without including any regexes to ignore: | |
```python | |
>>> exact_match = evaluate.load("exact_match") | |
>>> refs = ["the cat", "theater", "YELLING", "agent007"] | |
>>> preds = ["cat?", "theater", "yelling", "agent"] | |
>>> results = exact_match.compute(references=refs, predictions=preds) | |
>>> print(round(results["exact_match"], 2)) | |
0.25 | |
``` | |
Ignoring regexes "the" and "yell", as well as ignoring case and punctuation: | |
```python | |
>>> exact_match = evaluate.load("exact_match") | |
>>> refs = ["the cat", "theater", "YELLING", "agent007"] | |
>>> preds = ["cat?", "theater", "yelling", "agent"] | |
>>> results = exact_match.compute(references=refs, predictions=preds, regexes_to_ignore=["the ", "yell"], ignore_case=True, ignore_punctuation=True) | |
>>> print(round(results["exact_match"], 2)) | |
0.5 | |
``` | |
Note that in the example above, because the regexes are ignored before the case is normalized, "yell" from "YELLING" is not deleted. | |
Ignoring "the", "yell", and "YELL", as well as ignoring case and punctuation: | |
```python | |
>>> exact_match = evaluate.load("exact_match") | |
>>> refs = ["the cat", "theater", "YELLING", "agent007"] | |
>>> preds = ["cat?", "theater", "yelling", "agent"] | |
>>> results = exact_match.compute(references=refs, predictions=preds, regexes_to_ignore=["the ", "yell", "YELL"], ignore_case=True, ignore_punctuation=True) | |
>>> print(round(results["exact_match"], 2)) | |
0.75 | |
``` | |
Ignoring "the", "yell", and "YELL", as well as ignoring case, punctuation, and numbers: | |
```python | |
>>> exact_match = evaluate.load("exact_match") | |
>>> refs = ["the cat", "theater", "YELLING", "agent007"] | |
>>> preds = ["cat?", "theater", "yelling", "agent"] | |
>>> results = exact_match.compute(references=refs, predictions=preds, regexes_to_ignore=["the ", "yell", "YELL"], ignore_case=True, ignore_punctuation=True, ignore_numbers=True) | |
>>> print(round(results["exact_match"], 2)) | |
1.0 | |
``` | |
An example that includes sentences: | |
```python | |
>>> exact_match = evaluate.load("exact_match") | |
>>> refs = ["The cat sat on the mat.", "Theaters are great.", "It's like comparing oranges and apples."] | |
>>> preds = ["The cat sat on the mat?", "Theaters are great.", "It's like comparing apples and oranges."] | |
>>> results = exact_match.compute(references=refs, predictions=preds) | |
>>> print(round(results["exact_match"], 2)) | |
0.33 | |
``` | |
## Limitations and Bias | |
This metric is limited in that it outputs the same score for something that is completely wrong as for something that is correct except for a single character. In other words, there is no award for being *almost* right. | |
## Citation | |
## Further References | |
- Also used in the [SQuAD metric](https://github.com/huggingface/datasets/tree/master/metrics/squad) | |