--- title: matching_series tags: - evaluate - metric description: "Matching-based time-series generation metric" sdk: gradio sdk_version: 3.50 app_file: app.py pinned: false --- # Metric Card for matching_series ## Metric Description Matching Series is a metric for evaluating time-series generation models. It is based on the idea of matching the generated time-series with the original time-series. The metric calculates the Mean Squared Error (MSE) between the generated time-series and the original time-series between matched instances. The metric outputs a score greater or equal to 0, where 0 indicates a perfect generation. ## How to Use At minium, the metric requires the original time-series and the generated time-series as input. The metric can be used to evaluate the performance of time-series generation models. ```python >>> num_generation = 100 >>> num_reference = 10 >>> seq_len = 100 >>> num_features = 10 >>> references = np.random.rand(num_reference, seq_len, num_features) >>> predictions = np.random.rand(num_generation, seq_len, num_features) >>> metric = evaluate.load("bowdbeg/matching_series") >>> results = metric.compute(references=references, predictions=predictions, batch_size=1000) >>> print(results) {'precision_mse': 0.15642462680824154, 'f1_mse': 0.15423970232736145, 'recall_mse': 0.15211497466247828, 'index_mse': 0.1650527529752939, 'precision_mse_features': [0.14161461272391063, 0.13959801451122986, 0.13494790079336152, 0.13812467072775822, 0.13502155933085397, 0.13773603530687478, 0.13782869677371534, 0.13880373566781345, 0.1347356979110729, 0.1380613227954152], 'f1_mse_features': [0.13200523240237663, 0.1321561699583367, 0.12686344486378406, 0.12979789457435542, 0.12768556637792927, 0.1316950291866994, 0.12937893459231917, 0.13052145628415104, 0.12571029554640592, 0.12686388502130683], 'recall_mse_features': [0.12361708937664843, 0.1254676048318782, 0.11969288602958734, 0.12241798787954035, 0.12110565263179066, 0.12616166677071738, 0.12190537193383513, 0.1231719120998892, 0.1178181328089802, 0.11734651764610313], 'index_mse_features': [0.16728853331521837, 0.1673468681819004, 0.16940025907048203, 0.16828093040638223, 0.17486439883284577, 0.15779474562305962, 0.16255301663470148, 0.16224400164732194, 0.1531092505944622, 0.167645525446565], 'macro_precision_mse': 0.1376472246542006, 'macro_recall_mse': 0.121870482200897, 'macro_f1_mse': 0.12926779088076645, 'macro_index_mse': 0.1650527529752939, 'matching_precision': 0.09, 'matching_recall': 1.0, 'matching_f1': 0.1651376146788991, 'matching_precision_features': [0.1, 0.1, 0.1, 0.1, 0.09, 0.09, 0.1, 0.1, 0.1, 0.1], 'matching_recall_features': [1.0, 1.0, 1.0, 0.7, 0.9, 1.0, 0.9, 1.0, 0.9, 0.8], 'matching_f1_features': [0.18181818181818182, 0.18181818181818182, 0.18181818181818182, 0.175, 0.16363636363636364, 0.1651376146788991, 0.18, 0.18181818181818182, 0.18, 0.17777777777777778], 'macro_matching_precision': 0.098, 'macro_matching_recall': 0.92, 'macro_matching_f1': 0.1768824483365768, 'cuc': 0.1364, 'coverages': [0.10000000000000002, 0.16666666666666666, 0.3, 0.5333333333333333, 0.9], 'macro_cuc': 0.13874, 'macro_coverages': [0.10000000000000002, 0.18000000000000002, 0.31, 0.48, 0.98], 'cuc_features': [0.1428, 0.13580000000000003, 0.15250000000000002, 0.14579999999999999, 0.12990000000000002, 0.1364, 0.1459, 0.12330000000000002, 0.13580000000000003, 0.13920000000000002], 'coverages_features': [[0.10000000000000002, 0.16666666666666666, 0.3666666666666667, 0.5, 1.0], [0.10000000000000002, 0.16666666666666666, 0.26666666666666666, 0.43333333333333335, 1.0], [0.10000000000000002, 0.20000000000000004, 0.3666666666666667, 0.6, 1.0], [0.10000000000000002, 0.16666666666666666, 0.3333333333333333, 0.5333333333333333, 1.0], [0.10000000000000002, 0.20000000000000004, 0.26666666666666666, 0.4666666666666666, 0.9], [0.10000000000000002, 0.16666666666666666, 0.30000000000000004, 0.5333333333333333, 0.9], [0.10000000000000002, 0.20000000000000004, 0.3333333333333333, 0.5333333333333333, 1.0], [0.10000000000000002, 0.20000000000000004, 0.3, 0.3, 1.0], [0.10000000000000002, 0.16666666666666666, 0.26666666666666666, 0.4333333333333333, 1.0], [0.10000000000000002, 0.16666666666666666, 0.30000000000000004, 0.4666666666666666, 1.0]]} ``` ### Inputs - **predictions**: (list of list of list of float or numpy.ndarray): The generated time-series. The shape of the array should be `(num_generation, seq_len, num_features)`. - **references**: (list of list of list of float or numpy.ndarray): The original time-series. The shape of the array should be `(num_reference, seq_len, num_features)`. - **batch_size**: (int, optional): The batch size for computing the metric. This affects quadratically. Default is None. - **cuc_n_calculation**: (int, optional): The number of samples to compute the coverage because sampling exists. Default is 3. - **cuc_n_samples**: (list of int, optional): The number of samples to compute the coverage. Default is $[2^i \text{for} i \leq \log_2 n] + [n]$. ### Output Values Let prediction instances be $P = \{p_1, p_2, \ldots, p_n\}$ and reference instances be $R = \{r_1, r_2, \ldots, r_m\}$. - **precision_mse**: (float): Average of the MSE between the generated instance and the reference instance with the lowest MSE. Intuitively, this is similar to precision in classification. In the equation, $\frac{1}{n} \sum_{i=1}^{n} \min_{j} \mathrm{MSE}(p_i, r_j)$. - **recall_mse**: (float): Average of the MSE between the reference instance and the with the lowest MSE. Intuitively, this is similar to recall in classification. In the equation, $\frac{1}{m} \sum_{j=1}^{m} \min_{i} \mathrm{MSE}(p_i, r_j)$. - **f1_mse**: (float): Harmonic mean of the precision_mse and recall_mse. This is similar to F1-score in classification. - **index_mse**: (float): Average of the MSE between the generated instance and the reference instance with the same index. In the equation, $\frac{1}{n} \sum_{i=1}^{n} \mathrm{MSE}(p_i, r_i)$. - **precision_mse_features**: (list of float): precision_mse computed individually for each feature. - **recall_mse_features**: (list of float): recall_mse computed individually for each feature. - **f1_mse_features**: (list of float): f1_mse computed individually for each feature. - **index_mse_features**: (list of float): index_mse computed individually for each feature. - **macro_precision_mse**: (float): Average of the precision_mse_features. - **macro_recall_mse**: (float): Average of the recall_mse_features. - **macro_f1_mse**: (float): Average of the f1_mse_features. - **macro_index_mse**: (float): Average of the index_mse_features. - **matching_precision**: (float): Precision of the matching instances. In the equation, $\frac{ | \{i | \min_{i} \mathrm{MSE}(p_i, r_j)\} | }{m}$. - **matching_recall**: (float): Recall of the matching instances. In the equation, $\frac{ | \{j | \min_{j} \mathrm{MSE}(p_i, r_j)\} | }{n}$. - **matching_f1**: (float): F1-score of the matching instances. - **matching_precision_features**: (list of float): matching_precision computed individually for each feature. - **matching_recall_features**: (list of float): matching_recall computed individually for each feature. - **matching_f1_features**: (list of float): matching_f1 computed individually for each feature. - **macro_matching_precision**: (float): Average of the matching_precision_features. - **macro_matching_recall**: (float): Average of the matching_recall_features. - **macro_matching_f1**: (float): Average of the matching_f1_features. - **coverages**: (list of float): Coverage of the matching instances computed on the sampled generated data in cuc_n_samples. In the equation, $[\frac{ | \{ j | \min_{j} \mathrm{MSE}(p_i, r_j) \text{where}~p_i \in \mathrm{sample}(P, \mathrm{n\_sample}) \} | }{m} \text{for}~\mathrm{n\_sample} \in \mathrm{cuc\_n\_samples} ]$. - **cuc**: (float): Coverage of the matching instances. In the equation, $\frac{ | \{i | \min_{i} \mathrm{MSE}(p_i, r_j) < \mathrm{threshold}\} | }{n}$. - **coverages_features**: (list of list of float): coverages computed individually for each feature. - **cuc_features**: (list of float): cuc computed individually for each feature. - **macro_coverages**: (list of float): Average of the coverages_features. - **macro_cuc**: (float): Average of the cuc_features. #### Values from Popular Papers ### Examples ## Limitations and Bias This metric is based on the assumption that the generated time-series should match the original time-series. This may not be the case in some scenarios. The metric may not be suitable for evaluating time-series generation models that are not required to match the original time-series. ## Citation ## Further References