Spaces:
Runtime error
Runtime error
File size: 7,367 Bytes
1233062 56baf6d 1233062 56baf6d 1233062 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 |
import os
import numpy as np
import pandas as pd
import pinecone
import streamlit as st
from src.utils import (
tsfeatures_vector, get_closest_ids,
plot_best_models_count, plot_closest_series,
get_catalogue
)
CATALOGUE = get_catalogue()
pinecone.init(
api_key=os.environ['API_KEY'],
environment=os.environ['ENVIRONMENT'],
)
INDEX = pinecone.Index(os.environ['INDEX_NAME'])
DATASETS = {
"Demand (AirPassengers)": "https://raw.githubusercontent.com/Nixtla/transfer-learning-time-series/main/datasets/air_passengers.csv",
"Electricity (Ercot COAST)": "https://raw.githubusercontent.com/Nixtla/transfer-learning-time-series/main/datasets/ercot_COAST.csv",
#"Electriciy (ERCOT, multiple markets)": "https://raw.githubusercontent.com/Nixtla/transfer-learning-time-series/main/datasets/ercot_multiple_ts.csv",
"Web Traffic (Peyton Manning)": "https://raw.githubusercontent.com/Nixtla/transfer-learning-time-series/main/datasets/peyton_manning.csv",
"Finance (Exchange USD-EUR)": "https://raw.githubusercontent.com/Nixtla/transfer-learning-time-series/main/datasets/usdeur.csv",
}
def st_timenet_features():
st.set_page_config(
page_title="TimeNet Insights",
page_icon="🔮",
layout="wide",
initial_sidebar_state="expanded",
)
st.title(
"TimeNet Insights: Revolutionizing Time Series by Nixtla"
)
st.write(
"<style>div.block-container{padding-top:2rem;}</style>", unsafe_allow_html=True
)
intro = """
This tool is designed to perform time series analysis through comparative studies with the TimeNet dataset, curated by Nixtla.
Beyond simply identifying the most similar series within our dataset to your uploaded data, this application provides insights into your time series, especially in terms of forecasting model performance.
This app allows you to determine which predictive models might perform optimally for your particular data. The comparison between your series and similar ones within the TimeNet dataset can enhance your understanding of the context of your time series and facilitate better-informed forecasting decisions.
"""
st.write(intro)
required_cols = ["ds", "y"]
with st.sidebar.expander("Dataset", expanded=True):
data_selection = st.selectbox("Select example dataset", DATASETS.keys())
data_url = DATASETS[data_selection]
url_json = st.text_input("Data (you can pass your own url here)", data_url)
st.write(
"You can also upload a CSV file like [this one](https://github.com/Nixtla/transfer-learning-time-series/blob/main/datasets/air_passengers.csv)."
)
uploaded_file = st.file_uploader("Upload CSV")
with st.form("Data"):
if uploaded_file is not None:
df = pd.read_csv(uploaded_file)
cols = df.columns
timestamp_col = st.selectbox("Timestamp column", options=cols)
value_col = st.selectbox("Value column", options=cols)
else:
timestamp_col = st.text_input("Timestamp column", value="timestamp")
value_col = st.text_input("Value column", value="value")
st.write("You must press Submit each time you want to forecast.")
submitted = st.form_submit_button("Submit")
if submitted:
if uploaded_file is None:
st.write("Please provide a dataframe.")
if url_json.endswith("json"):
df = pd.read_json(url_json)
else:
df = pd.read_csv(url_json)
df = df.rename(
columns=dict(zip([timestamp_col, value_col], required_cols))
)
else:
# df = pd.read_csv(uploaded_file)
df = df.rename(
columns=dict(zip([timestamp_col, value_col], required_cols))
)
else:
if url_json.endswith("json"):
df = pd.read_json(url_json)
else:
df = pd.read_csv(url_json)
cols = df.columns
if "unique_id" in cols:
cols = cols[-2:]
df = df.rename(columns=dict(zip(cols, required_cols)))
if "unique_id" not in df:
df.insert(0, "unique_id", "ts_0")
df["ds"] = pd.to_datetime(df["ds"])
df = df.sort_values(["unique_id", "ds"])
with st.sidebar:
seasonality = st.number_input("Seasonality of your data:", value=1)
top_k = st.number_input('Number of closest series:', value=12)
y_vector_feats = tsfeatures_vector(df.tail(100), seasonality)
closest_ids = get_closest_ids(y_vector_feats, top_k, INDEX)
st.header('Closest match from TimeNet')
st.write(
"""
This side-by-side plot visualizes your uploaded time series (left) and its closest match from the TimeNet dataset (right).
By comparing these two plots, you can see how your data's behavior aligns with the most similar series in the TimeNet dataset. Similarities in trends, cycles, or patterns can indicate shared underlying structures or influences between your series and the TimeNet series.
"""
)
st.pyplot(
plot_closest_series(df, closest_ids[0]['id'], CATALOGUE)
)
st.header('Potential winner models')
fig, summary_df = plot_best_models_count(closest_ids, CATALOGUE)
st.subheader("Model performance analysis for similar time series")
st.write(
"""
This section presents a table that illustrates the average scaled performance of the closest series to your uploaded series. The performance metric used here is compared against a Naive forecast model. A Naive forecast model is a simple prediction method that assumes the future will be the same as the present. This comparison allows you to understand how well more sophisticated models perform relative to this basic prediction strategy.
In other words, the table shows the performance of various models when applied to time series that are highly similar to the one you uploaded, relative to a simple model that only projects the current values into the future. This allows you to assess what kind of improvements you might expect if you were to employ these more sophisticated models on your own series.
By using this information, you can make more informed decisions about which models are likely to provide valuable insights for your particular data set. It also offers the opportunity to assess and explore the potential benefits of using different forecasting models for your data.
"""
)
st.dataframe(summary_df)
st.subheader("Winner models")
st.write(
"""
This plot showcases the "win rate" of various predictive models.
Each model's win rate is based on how frequently it outperforms others when used to forecast the closest series to your own data.
This visualization allows you to compare the effectiveness of different models and identify which ones are more likely to provide accurate forecasts for your data.
"""
)
st.pyplot(
fig
)
if __name__ == "__main__":
st_timenet_features()
|