matching_series / README.md
bowdbeg's picture
add macro_index_mse
aafacb6
|
raw
history blame
5.6 kB
metadata
title: matching_series
tags:
  - evaluate
  - metric
description: Matching-based time-series generation metric
sdk: gradio
sdk_version: 3.5
app_file: app.py
pinned: false

Metric Card for matching_series

Metric Description

Matching Series is a metric for evaluating time-series generation models. It is based on the idea of matching the generated time-series with the original time-series. The metric calculates the Mean Squared Error (MSE) between the generated time-series and the original time-series between matched instances. The metric outputs a score greater or equal to 0, where 0 indicates a perfect generation.

How to Use

At minium, the metric requires the original time-series and the generated time-series as input. The metric can be used to evaluate the performance of time-series generation models.

>>> num_generation = 100
>>> num_reference = 10
>>> seq_len = 100
>>> num_features = 10
>>> references = np.random.rand(num_reference, seq_len, num_features)
>>> predictions = np.random.rand(num_generation, seq_len, num_features)
>>> metric = evaluate.load("bowdbeg/matching_series")
>>> results = metric.compute(references=references, predictions=predictions, batch_size=1000)
>>> print(results)
{'matching_mse': 0.15873331613053895, 'harmonic_mean': 0.15623569099681772, 'covered_mse': 0.15381544718035087, 'index_mse': 0.16636189201532087, 'matching_mse_features': [0.13739837269222452, 0.1395309409295018, 0.13677679887355126, 0.14408421162706211, 0.1430115910456261, 0.13726657544044085, 0.14274372684301717, 0.13504614539190338, 0.13853582796877975, 0.14482307626368343], 'harmonic_mean_features': [0.1309991815519093, 0.13157175020534279, 0.12735134531950718, 0.1327483317911355, 0.1336402851605765, 0.12878380179856022, 0.1344831997941457, 0.12782689483798823, 0.12909420446395195, 0.13417435670997752], 'covered_mse_features': [0.12516953618356524, 0.12447158260731798, 0.11914118322950448, 0.12306606276504639, 0.1254216201001874, 0.12128844181049621, 0.12712643943219143, 0.12134032531607968, 0.12085741660832867, 0.12498436126166071], 'index_mse_features': [0.16968036010688156, 0.1624888691672768, 0.15926142198600082, 0.17250634507748022, 0.16713668302081525, 0.16663213728264645, 0.1596766027744231, 0.16251306560725656, 0.17160303243460656, 0.17212040269582168], 'macro_matching_mse': 0.13992172670757905, 'macro_covered_mse': 0.12328669693143782, 'macro_harmonic_mean': 0.13106733516330948, 'macro_index_mse': 0.1663618920153209}

Inputs

  • predictions: (list of list of list of float or numpy.ndarray): The generated time-series. The shape of the array should be (num_generation, seq_len, num_features).
  • references: (list of list of list of float or numpy.ndarray): The original time-series. The shape of the array should be (num_reference, seq_len, num_features).
  • batch_size: (int, optional): The batch size for computing the metric. This affects quadratically. Default is None.

Output Values

Let prediction instances be $P = {p_1, p_2, \ldots, p_n}$ and reference instances be $R = {r_1, r_2, \ldots, r_m}$.

  • matching_mse: (float): Average of the MSE between the generated instance and the reference instance with the lowest MSE. Intuitively, This is similar to precision in classification. In the equation, $\frac{1}{n} \sum_{i=1}^{n} \min_{j} \mathrm{MSE}(p_i, r_j)$.
  • covered_mse: (float): Average of the MSE between the reference instance and the with the lowest MSE. Intuitively, This is similar to recall in classification. In the equation, $\frac{1}{m} \sum_{j=1}^{m} \min_{i} \mathrm{MSE}(p_i, r_j)$.
  • harmonic_mean: (float): Harmonic mean of the matching_mse and covered_mse. This is similar to F1-score in classification.
  • index_mse: (float): Average of the MSE between the generated instance and the reference instance with the same index. In the equation, $\frac{1}{n} \sum_{i=1}^{n} \mathrm{MSE}(p_i, r_i)$.
  • matching_mse_features: (list of float): matching_mse computed individually for each feature.
  • covered_mse_features: (list of float): covered_mse computed individually for each feature.
  • harmonic_mean_features: (list of float): harmonic_mean computed individually for each feature.
  • index_mse_features: (list of float): index_mse computed individually for each feature.
  • macro_matching_mse: (float): Average of the matching_mse_features.
  • macro_covered_mse: (float): Average of the covered_mse_features.
  • macro_harmonic_mean: (float): Average of the harmonic_mean_features.
  • macro_index_mse: (float): Average of the index_mse_features.

Values from Popular Papers

Examples

Limitations and Bias

This metric is based on the assumption that the generated time-series should match the original time-series. This may not be the case in some scenarios. The metric may not be suitable for evaluating time-series generation models that are not required to match the original time-series.

Citation

Further References