Post
1773
⌚ Visiting the past with Time Machine GPT!
We are all familiar with the concept of a suite of models being a series of variants of a certain model that differ mainly in size. For example, Llama-2 7B, Llama-2 13B, Llama-2 70B
But this is not always the case. Researchers from The University of Oxford, The Alan Turing Institute, and The University of Manchester introduced TimeMachineGPT (TiMaGPT), a suite of language models that were pretrained on data constrained by a certain period in time. Instead of various sizes of the model, you get the same model but trained on different data coming from different times.
Using a GPT-2 model architecture with 117 million parameters, they trained 12 different models on Wikipedia and WMT News from 2011 to 2022 with each year represented by a model. For example, TiMaGPT-2011, TiMaGPT-2012, ..., TiMaGPT-2022.
🤔 But how could these models be useful?
They can be very useful. For example:
1️⃣ Most language models are static in the sense that they are trapped in the time bubble of their pretraining data, their knowledge is limited by the cut-off date of their training dataset. In order to update their knowledge, Temporal Adaptation can be performed, which means further training on newer data. The TiMaGPT series of models can be used to study the limitations of Temporal Adaptation of language models.
2️⃣ Word meaning can change not only with its context but also with its time of use and there is a large amount of research that focuses on understanding how embeddings shift through time. TiMaGPT will be very helpful in studying this phenomenon.
3️⃣ One more use case in the context of Time-series forecasting and event prediction is "backtesting". Which is using historical data to evaluate new models for forecasting the future. Models like TiMaGPT (each living in its own time without any knowledge of the future/present) will be great for such a use case.
🤗 All models and datasets are on the hub: https://huggingface.co/Ti-Ma
We are all familiar with the concept of a suite of models being a series of variants of a certain model that differ mainly in size. For example, Llama-2 7B, Llama-2 13B, Llama-2 70B
But this is not always the case. Researchers from The University of Oxford, The Alan Turing Institute, and The University of Manchester introduced TimeMachineGPT (TiMaGPT), a suite of language models that were pretrained on data constrained by a certain period in time. Instead of various sizes of the model, you get the same model but trained on different data coming from different times.
Using a GPT-2 model architecture with 117 million parameters, they trained 12 different models on Wikipedia and WMT News from 2011 to 2022 with each year represented by a model. For example, TiMaGPT-2011, TiMaGPT-2012, ..., TiMaGPT-2022.
🤔 But how could these models be useful?
They can be very useful. For example:
1️⃣ Most language models are static in the sense that they are trapped in the time bubble of their pretraining data, their knowledge is limited by the cut-off date of their training dataset. In order to update their knowledge, Temporal Adaptation can be performed, which means further training on newer data. The TiMaGPT series of models can be used to study the limitations of Temporal Adaptation of language models.
2️⃣ Word meaning can change not only with its context but also with its time of use and there is a large amount of research that focuses on understanding how embeddings shift through time. TiMaGPT will be very helpful in studying this phenomenon.
3️⃣ One more use case in the context of Time-series forecasting and event prediction is "backtesting". Which is using historical data to evaluate new models for forecasting the future. Models like TiMaGPT (each living in its own time without any knowledge of the future/present) will be great for such a use case.
🤗 All models and datasets are on the hub: https://huggingface.co/Ti-Ma