File size: 9,481 Bytes
ab779ac 746a2a0 3564506 746a2a0 71cdced 8c3aa40 3564506 1f5d691 746a2a0 d1748f6 1655761 c54c504 1655761 c54c504 39239a0 940abbf ba1f9d5 39239a0 ba1f9d5 b890706 c54c504 39239a0 d1748f6 746a2a0 ba1f9d5 746a2a0 ba1f9d5 39239a0 746a2a0 ba1f9d5 39239a0 746a2a0 fa68c66 746a2a0 fa68c66 746a2a0 fac05af 746a2a0 dad53f8 746a2a0 ba1f9d5 dad53f8 c54c504 ba1f9d5 746a2a0 ba1f9d5 746a2a0 ba1f9d5 746a2a0 ba1f9d5 746a2a0 ba1f9d5 746a2a0 ba1f9d5 746a2a0 ba1f9d5 746a2a0 ba1f9d5 746a2a0 ba1f9d5 746a2a0 ba1f9d5 746a2a0 4920a28 746a2a0 4920a28 746a2a0 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 |
---
license: cdla-permissive-2.0
---
# TTM Model Card
TTM, also known as TinyTimeMixer, are compact pre-trained models for Time-Series Forecasting, open-sourced by IBM Research.
**With less than 1 Million parameters, TTM introduces the notion of the first-ever “tiny” pre-trained models for Time-Series Forecasting.**
TTM outperforms several popular benchmarks demanding billions of parameters in zero-shot and few-shot forecasting. TTM is pre-trained on diverse public time-series datasets which
can be easily fine-tuned for your target data. Refer to our [paper](https://arxiv.org/pdf/2401.03955.pdf) for more details. The current open-source
version supports point forecasting use-cases ranging from minutely to hourly resolutions (Ex. 10 min, 15 min, 1 hour, etc.)
**Note that zeroshot, fine-tuning and inference tasks using TTM can easily be executed in 1 GPU machine or in laptops too!!**
## Benchmark Highlights:
- TTM (with less than 1 Million parameters) outperforms the following popular Pre-trained SOTAs demanding several hundred Million to Billions of parameters:
- *GPT4TS (NeurIPS 23) by 7-12% in few-shot forecasting.*
- *LLMTime (NeurIPS 23) by 24% in zero-shot forecasting*.
- *SimMTM (NeurIPS 23) by 17% in few-shot forecasting*.
- *Time-LLM (ICLR 24) by 8% in few-shot(5%) forecasting*
- *UniTime (WWW 24) by 27% in zero-shot forecasting.*
- Zero-shot results of TTM surpass the *few-shot results of many popular SOTA approaches* including
PatchTST (ICLR 23), PatchTSMixer (KDD 23), TimesNet (ICLR 23), DLinear (AAAI 23) and FEDFormer (ICML 22).
- TTM (1024-96, released in this model card with 1M parameters) outperforms pre-trained MOIRAI-Small (14M parameters) by 10%, MOIRAI-Base (91M parameters) by 2% and
MOIRAI-Large (311M parameters) by 3% on zero-shot forecasting (fl = 96). [[notebook]](https://github.com/IBM/tsfm/blob/main/notebooks/hfdemo/tinytimemixer/ttm_benchmarking_1024_96.ipynb)
- TTM quick fine-tuning also outperforms the hard statistical baselines (Statistical ensemble and S-Naive) in
M4-hourly dataset which existing pretrained TS models are finding hard to outperform. [[notebook]](https://github.com/IBM/tsfm/blob/main/notebooks/hfdemo/tinytimemixer/ttm_m4_hourly.ipynb)
- TTM takes only a *few seconds for zeroshot/inference* and a *few minutes for finetuning* in 1 GPU machine, as
opposed to long timing-requirements and heavy computing infra needs of other existing pretrained models.
## Model Description
TTM falls under the category of “focused pre-trained models”, wherein each pre-trained TTM is tailored for a particular forecasting
setting (governed by the context length and forecast length). Instead of building one massive model supporting all forecasting settings,
we opt for the approach of constructing smaller pre-trained models, each focusing on a specific forecasting setting, thereby
yielding more accurate results. Furthermore, this approach ensures that our models remain extremely small and exceptionally fast,
facilitating easy deployment without demanding a ton of resources.
Hence, in this model card, we plan to release several pre-trained
TTMs that can cater to many common forecasting settings in practice. Additionally, we have released our source code along with
our pretraining scripts that users can utilize to pretrain models on their own. Pretraining TTMs is very easy and fast, taking
only 3-6 hours using 6 A100 GPUs, as opposed to several days or weeks in traditional approaches.
Each pre-trained model will be released in a different branch name in this model card. Kindly access the required model using our
getting started [notebook](https://github.com/IBM/tsfm/blob/main/notebooks/hfdemo/ttm_getting_started.ipynb) mentioning the branch name.
## Model Releases (along with the branch name where the models are stored):
- **512-96:** Given the last 512 time-points (i.e. context length), this model can forecast up to next 96 time-points (i.e. forecast length)
in future. Recommended for hourly and minutely forecasts (Ex. resolutions 5 min, 10 min, 15 min, 1 hour, etc) (branch name: main)
- **1024-96:** Given the last 1024 time-points (i.e. context length), this model can forecast up to next 96 time-points (i.e. forecast length)
in future. Recommended for hourly and minutely forecasts (Ex. resolutions 5 min, 10 min, 15 min, 1 hour, etc) (branch name: 1024-96-v1)
- Stay tuned for more models !
## Model Details
For more details on TTM architecture and benchmarks, refer to our [paper](https://arxiv.org/pdf/2401.03955.pdf).
TTM-1 currently supports 2 modes:
- **Zeroshot forecasting**: Directly apply the pre-trained model on your target data to get an initial forecast (with no training).
- **Finetuned forecasting**: Finetune the pre-trained model with a subset of your target data to further improve the forecast.
**Since, TTM models are extremely small and fast, it is practically very easy to finetune the model with your available target data in few minutes
to get more accurate forecasts.**
The current release supports multivariate forecasting via both channel independence and channel-mixing approaches.
Decoder Channel-Mixing can be enabled during fine-tuning for capturing strong channel-correlation patterns across
time-series variates, a critical capability lacking in existing counterparts.
In addition, TTM also supports exogenous infusion and categorical data which is not released as part of this version.
Stay tuned for these extended features.
## Recommended Use
1. Users have to externally standard scale their data indepedently for every channel before feeding it to the model (Refer to [TSP](https://github.com/IBM/tsfm/blob/main/tsfm_public/toolkit/time_series_preprocessor.py), our data processing utility for data scaling.)
2. Enabling any upsampling or prepending zeros to virtually increase the context length for shorter length datasets is not recommended and will
impact the model performance.
### Model Sources
- **Repository:** https://github.com/IBM/tsfm/tree/main/tsfm_public/models/tinytimemixer
- **Paper:** https://arxiv.org/pdf/2401.03955.pdf
## Uses
```
# Load Model from HF Model Hub mentioning the branch name in revision field
model = TinyTimeMixerForPrediction.from_pretrained(
"https://huggingface.co/ibm/TTM", revision="main"
)
# Do zeroshot
zeroshot_trainer = Trainer(
model=model,
args=zeroshot_forecast_args,
)
)
zeroshot_output = zeroshot_trainer.evaluate(dset_test)
# Freeze backbone and enable few-shot or finetuning:
# freeze backbone
for param in model.backbone.parameters():
param.requires_grad = False
finetune_forecast_trainer = Trainer(
model=model,
args=finetune_forecast_args,
train_dataset=dset_train,
eval_dataset=dset_val,
callbacks=[early_stopping_callback, tracking_callback],
optimizers=(optimizer, scheduler),
)
finetune_forecast_trainer.train()
fewshot_output = finetune_forecast_trainer.evaluate(dset_test)
```
## How to Get Started with the Model
[Getting Started Notebook](https://github.com/IBM/tsfm/blob/main/notebooks/hfdemo/ttm_getting_started.ipynb)
## Training Data
The TTM models were trained on a collection of datasets from the Monash Time Series Forecasting repository. The datasets used include:
- Australian Electricity Demand: https://zenodo.org/records/4659727
- Australian Weather: https://zenodo.org/records/4654822
- Bitcoin dataset: https://zenodo.org/records/5122101
- KDD Cup 2018 dataset: https://zenodo.org/records/4656756
- London Smart Meters: https://zenodo.org/records/4656091
- Saugeen River Flow: https://zenodo.org/records/4656058
- Solar Power: https://zenodo.org/records/4656027
- Sunspots: https://zenodo.org/records/4654722
- Solar: https://zenodo.org/records/4656144
- US Births: https://zenodo.org/records/4656049
- Wind Farms Production data: https://zenodo.org/records/4654858
- Wind Power: https://zenodo.org/records/4656032
## Citation [optional]
Kindly cite the following paper, if you intend to use our model or its associated architectures/approaches in your
work
**BibTeX:**
```
@article{ekambaram2024ttms,
title={TTMs: Fast Multi-level Tiny Time Mixers for Improved Zero-shot and Few-shot Forecasting of Multivariate Time Series},
author={Ekambaram, Vijay and Jati, Arindam and Nguyen, Nam H and Dayama, Pankaj and Reddy, Chandra and Gifford, Wesley M and Kalagnanam, Jayant},
journal={arXiv preprint arXiv:2401.03955},
year={2024}
}
```
**APA:**
Ekambaram, V., Jati, A., Nguyen, N. H., Dayama, P., Reddy, C., Gifford, W. M., & Kalagnanam, J. (2024). TTMs: Fast Multi-level Tiny Time Mixers for Improved Zero-shot and Few-shot Forecasting of Multivariate Time Series. arXiv preprint arXiv:2401.03955.
## Model Card Authors
Vijay Ekambaram, Arindam Jati, Pankaj Dayama, Nam H. Nguyen, Wesley Gifford and Jayant Kalagnanam
## Model Card Contact
[More Information Needed]
## IBM Public Repository Disclosure:
All content in this repository including code has been provided by IBM under the associated
open source software license and IBM is under no obligation to provide enhancements,
updates, or support. IBM developers produced this code as an
open source project (not as an IBM product), and IBM makes no assertions as to
the level of quality nor security, and will not be maintaining this code going forward. |