|
--- |
|
title: LogMetric |
|
datasets: |
|
- None |
|
tags: |
|
- evaluate |
|
- metric |
|
description: 'TODO: add a description here' |
|
sdk: gradio |
|
sdk_version: 3.19.1 |
|
app_file: app.py |
|
pinned: false |
|
--- |
|
|
|
# Metric Card for LogMetric |
|
|
|
***Module Card Instructions:*** *Fill out the following subsections. Feel free to take a look at existing metric cards if you'd like examples.* |
|
|
|
## Metric Description |
|
This metric is used for evaluating how good a generated log(file) is, given a reference. |
|
|
|
The metric evaluates if the predicted log has the correct amount of timestamps, if timestamps are monotonically increasing and if the timestamps are consistent in their format. |
|
|
|
## How to Use |
|
The metric can be just by simply giving the predicted log and the reference log as string. |
|
|
|
Example with timestamps that are of correct amount, consistent, monotonically increasing (-> timestamp score of 1.0): |
|
``` |
|
>>> predictions = ["2024-01-12 11:23 It's over Anikin, I have the high ground \n 2024-01-12 11:24 You underestimate my power!"] |
|
>>> references = ["2024-02-14 Hello there! \n 2024-02-14 General Kenobi! You're a bold one, aren't you?"] |
|
logmetric = evaluate.load("svenwey/logmetric") |
|
>>> results = logmetric.compute(predictions=predictions, |
|
... references=references) |
|
>>> print(results["score"]) |
|
1.0 |
|
``` |
|
|
|
Example with timestamp missing from prediction: |
|
``` |
|
>>> predictions = ["You were my brother Anikin"] |
|
>>> references = ["2024-01-12 You were my brother Anikin"] |
|
logmetric = evaluate.load("svenwey/logmetric") |
|
>>> results = logmetric.compute(predictions=predictions, |
|
... references=references) |
|
>>> print(results["score"]) |
|
0.0 |
|
``` |
|
|
|
|
|
### Inputs |
|
*List all input arguments in the format below* |
|
- **predictions** *(string list): The logs, as predicted/generated by the ML model. **Important: Every logfile is only one string, even if it contains multiple lines!*** |
|
- **references** *(string list): The reference logs (ground truth)* |
|
### Output Values |
|
|
|
*Explain what this metric outputs and provide an example of what the metric output looks like. Modules should return a dictionary with one or multiple key-value pairs, e.g. {"bleu" : 6.02}* |
|
|
|
*State the range of possible values that the metric's output can take, as well as what in that range is considered good. For example: "This metric can take on any value between 0 and 100, inclusive. Higher scores are better."* |
|
|
|
#### Values from Popular Papers |
|
*Give examples, preferrably with links to leaderboards or publications, to papers that have reported this metric, along with the values they have reported.* |
|
|
|
### Examples |
|
*Give code examples of the metric being used. Try to include examples that clear up any potential ambiguity left from the metric description above. If possible, provide a range of examples that show both typical and atypical results, as well as examples where a variety of input parameters are passed.* |
|
|
|
## Limitations and Bias |
|
*Note any known limitations or biases that the metric has, with links and references if possible.* |
|
|
|
## Citation |
|
*Cite the source where this metric was introduced.* |
|
|
|
## Further References |
|
*Add any useful further references.* |