File size: 5,595 Bytes
cd18dd0
d85d83b
 
 
 
8f3e4ca
cd18dd0
851133a
cd18dd0
 
 
 
d85d83b
 
 
3a2569c
d85d83b
 
3a2569c
 
 
 
 
 
 
 
 
 
 
 
aafacb6
3a2569c
d85d83b
 
3a2569c
 
efa4c13
d85d83b
 
 
efa4c13
d85d83b
efa4c13
 
 
 
 
 
 
 
 
 
 
aafacb6
d85d83b
 
efa4c13
d85d83b
 
efa4c13
d85d83b
 
efa4c13
d85d83b
 
efa4c13
d85d83b
 
efa4c13
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
---
title: matching_series
tags:
- evaluate
- metric
description: "Matching-based time-series generation metric"
sdk: gradio
sdk_version: 3.50
app_file: app.py
pinned: false
---

# Metric Card for matching_series

## Metric Description
Matching Series is a metric for evaluating time-series generation models. It is based on the idea of matching the generated time-series with the original time-series. The metric calculates the Mean Squared Error (MSE) between the generated time-series and the original time-series between matched instances. The metric outputs a score greater or equal to 0, where 0 indicates a perfect generation.

## How to Use
At minium, the metric requires the original time-series and the generated time-series as input. The metric can be used to evaluate the performance of time-series generation models.

```python
>>> num_generation = 100
>>> num_reference = 10
>>> seq_len = 100
>>> num_features = 10
>>> references = np.random.rand(num_reference, seq_len, num_features)
>>> predictions = np.random.rand(num_generation, seq_len, num_features)
>>> metric = evaluate.load("bowdbeg/matching_series")
>>> results = metric.compute(references=references, predictions=predictions, batch_size=1000)
>>> print(results)
{'matching_mse': 0.15873331613053895, 'harmonic_mean': 0.15623569099681772, 'covered_mse': 0.15381544718035087, 'index_mse': 0.16636189201532087, 'matching_mse_features': [0.13739837269222452, 0.1395309409295018, 0.13677679887355126, 0.14408421162706211, 0.1430115910456261, 0.13726657544044085, 0.14274372684301717, 0.13504614539190338, 0.13853582796877975, 0.14482307626368343], 'harmonic_mean_features': [0.1309991815519093, 0.13157175020534279, 0.12735134531950718, 0.1327483317911355, 0.1336402851605765, 0.12878380179856022, 0.1344831997941457, 0.12782689483798823, 0.12909420446395195, 0.13417435670997752], 'covered_mse_features': [0.12516953618356524, 0.12447158260731798, 0.11914118322950448, 0.12306606276504639, 0.1254216201001874, 0.12128844181049621, 0.12712643943219143, 0.12134032531607968, 0.12085741660832867, 0.12498436126166071], 'index_mse_features': [0.16968036010688156, 0.1624888691672768, 0.15926142198600082, 0.17250634507748022, 0.16713668302081525, 0.16663213728264645, 0.1596766027744231, 0.16251306560725656, 0.17160303243460656, 0.17212040269582168], 'macro_matching_mse': 0.13992172670757905, 'macro_covered_mse': 0.12328669693143782, 'macro_harmonic_mean': 0.13106733516330948, 'macro_index_mse': 0.1663618920153209}
```

### Inputs
- **predictions**: (list of list of list of float or numpy.ndarray): The generated time-series. The shape of the array should be `(num_generation, seq_len, num_features)`.
- **references**: (list of list of list of float or numpy.ndarray): The original time-series. The shape of the array should be `(num_reference, seq_len, num_features)`.
- **batch_size**: (int, optional): The batch size for computing the metric. This affects quadratically. Default is None.

### Output Values

Let prediction instances be $P = \{p_1, p_2, \ldots, p_n\}$ and reference instances be $R = \{r_1, r_2, \ldots, r_m\}$.

- **matching_mse**: (float): Average of the MSE between the generated instance and the reference instance with the lowest MSE. Intuitively, This is similar to precision in classification. In the equation, $\frac{1}{n} \sum_{i=1}^{n} \min_{j} \mathrm{MSE}(p_i, r_j)$.
- **covered_mse**: (float): Average of the MSE between the reference instance and the  with the lowest MSE. Intuitively, This is similar to recall in classification. In the equation, $\frac{1}{m} \sum_{j=1}^{m} \min_{i} \mathrm{MSE}(p_i, r_j)$.
- **harmonic_mean**: (float): Harmonic mean of the matching_mse and covered_mse. This is similar to F1-score in classification.
- **index_mse**: (float): Average of the MSE between the generated instance and the reference instance with the same index. In the equation, $\frac{1}{n} \sum_{i=1}^{n} \mathrm{MSE}(p_i, r_i)$.
- **matching_mse_features**: (list of float): matching_mse computed individually for each feature.
- **covered_mse_features**: (list of float): covered_mse computed individually for each feature.
- **harmonic_mean_features**: (list of float): harmonic_mean computed individually for each feature.
- **index_mse_features**: (list of float): index_mse computed individually for each feature.
- **macro_matching_mse**: (float): Average of the matching_mse_features.
- **macro_covered_mse**: (float): Average of the covered_mse_features.
- **macro_harmonic_mean**: (float): Average of the harmonic_mean_features.
- **macro_index_mse**: (float): Average of the index_mse_features.

#### Values from Popular Papers
<!-- *Give examples, preferrably with links to leaderboards or publications, to papers that have reported this metric, along with the values they have reported.* -->

### Examples
<!-- *Give code examples of the metric being used. Try to include examples that clear up any potential ambiguity left from the metric description above. If possible, provide a range of examples that show both typical and atypical results, as well as examples where a variety of input parameters are passed.* -->

## Limitations and Bias
This metric is based on the assumption that the generated time-series should match the original time-series. This may not be the case in some scenarios. The metric may not be suitable for evaluating time-series generation models that are not required to match the original time-series.

## Citation
<!-- *Cite the source where this metric was introduced.* -->

## Further References
<!-- *Add any useful further references.* -->