ece / README.md
jordyvl's picture
might be defunct now
0c94397
|
raw
history blame
No virus
3.9 kB
metadata
title: ECE
datasets:
  - null
tags:
  - evaluate
  - metric
description: binned estimator of expected calibration error
sdk: gradio
sdk_version: 3.0.2
app_file: app.py
pinned: false

Metric Card for ECE

Module Card Instructions: Fill out the following subsections. Feel free to take a look at existing metric cards if you'd like examples.

Metric Description

Expected Calibration Error ECE is a standard metric to evaluate top-1 prediction miscalibration. It measures the L^p norm difference between a model’s posterior and the true likelihood of being correct. ECEp(f)p=E(X,Y)[E[Y=y^f(X)=p^]f(X)pp] ECE_p(f)^p= \mathbb{E}_{(X,Y)} \left[\|\mathbb{E}[Y = \hat{y} \mid f(X) = \hat{p}] - f(X)\|^p_p\right], where $\hat{y} = \argmax_{y'}[f(X)]y'$ is a class prediction with associated posterior probability $\hat{p}= \max{y'}[f(X)]_y'$.

It is generally implemented as a binned estimator that discretizes predicted probabilities into a range of possible values (bins) for which conditional expectation can be estimated.

As a metric of calibration error, it holds that the lower, the better calibrated a model is. For valid model comparisons, ensure to use the same keyword arguments.

How to Use

Inputs

Output Values

Examples

Limitations and Bias

See [3],[4] and [5].

Citation

[1] Naeini, M.P., Cooper, G. and Hauskrecht, M., 2015, February. Obtaining well calibrated probabilities using bayesian binning. In Twenty-Ninth AAAI Conference on Artificial Intelligence. [2] Guo, C., Pleiss, G., Sun, Y. and Weinberger, K.Q., 2017, July. On calibration of modern neural networks. In International Conference on Machine Learning (pp. 1321-1330). PMLR. [3] Nixon, J., Dusenberry, M.W., Zhang, L., Jerfel, G. and Tran, D., 2019, June. Measuring Calibration in Deep Learning. In CVPR Workshops (Vol. 2, No. 7). [4] Kumar, A., Liang, P.S. and Ma, T., 2019. Verified uncertainty calibration. Advances in Neural Information Processing Systems, 32. [5] Vaicenavicius, J., Widmann, D., Andersson, C., Lindsten, F., Roll, J. and Schön, T., 2019, April. Evaluating model calibration in classification. In The 22nd International Conference on Artificial Intelligence and Statistics (pp. 3459-3467). PMLR. [6] Allen-Zhu, Z., Li, Y. and Liang, Y., 2019. Learning and generalization in overparameterized neural networks, going beyond two layers. Advances in neural information processing systems, 32.

Further References

Add any useful further references.