---
Copyright (c) 2023 Taoshi Inc
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
---
Background
The models provided here were created using open source modeling techniques
provided in https://github.com/taoshidev/time-series-prediction-subnet (TSPS).
They were achieved using the runnable/miner_training.py, and tested against
existing models and dummy models in runnable/miner_testing.py.
Build Strategy
This section outlines the strategy used to build the models.
Understanding Dataset Used
The dataset used to build the models can be generated using the
runnable/generate_historical_data.py. A lookback period between June 2022 and
July 2023 on the 5m interval was used to train the model. Through analysis, the
reason this dataset was used is because historical data beyond June 2022 provides
strongly trending price movement or data movement that is from a period where
Bitcoin's market cap was too small to be relevant to where Bitcoin is now.
Therefore, using more recent data was used which correlates to the current market
cap and macroeconomic conditions where its uncertain we'll continue to get highly
trending Bitcoin data.
Testing data was used between June 2023 and Nov 2023 to determine performance of
the models. This was tested using the runnable/miner_testing.py file with a
separately generated test dataset from runnable/generate_historical_data.py.
Understanding Model Creation
As of now, the TSPS infrastructure only provides close, high, low, and volume. It
also provides financial indicators such as RSI, MACD, and Bollinger Bands but they
were not used for the purposes of training these models.
The models were derived using a variety of windows and iterations through the June
2022 to June 2023 dataset. The strategy to derive the model was the following:
base_mining_model = BaseMiningModel(len(prep_dataset.T)) \
.set_neurons([[1024, 0]]) \
.set_window_size(100) \
.set_learning_rate(0.0000001) \
.set_batch_size(500) \
.set_model_dir(f'mining_models/model1.h5')
base_mining_model.train(prep_dataset, epochs=25)
where an LSTM model was created by using a few or no stacked layers. Most of the
v4 models are actually not stacked as they performed better not being stacked for
the most part. This could very likely change as more feature inputs are added (this
is being worked on as part of the open source infra in TSPS). The window size of
100 helped best predict the outcome, derived in mining_objects/base_mining_model.py
Understanding Training Decisions
Training the model used the previous 601 rows of data as an input. This is because
500 rows were used to batch, and we are looking to predict 100 rows into the future
(the challenge presented in the Time Series Prediction Subnet). Measures were taken
to ensure all data was trained on in the training data.
Each set of 601 rows was trained on 25 times, inside another loop which iterated on
the entirety of the dataset from 6/22 to 6/23 50 times. This provided the model the
ability to get granular with details yet not overfit to any single set of rows at
once. Therefore, a multi-layered looping infrastructure was used to derive the models.
for x in range(50):
for i in range(25):
train_model()
Strategy to Predict
The strategy to predict 100 closes of data into the future was to use a 1 step
methodology of predicting 1 step at 100 intervals into the future and connect the
information by generating a line from the last close to the prediction 100 closes
into the future. By doing so, the model could learn to predict a single step rather
than all 100 where loss could continue to increase with each misstep.
Model V5
Here's the text spaced out for readability in a README file:
Recommendations on how to perform better than V4 and what Model V5 will look like
are outlined below:
1. Concentrate on more difficult moves
2. Get more granular data (1m)
3. Get more data sources
4. Use more predicted steps
-- Concentrate on more difficult moves
The Time Series Prediction Subnet will reward models that are capable of predicting
more "difficult" movements in the market more than those that are less difficult.
Therefore, taking a strategy to train your model on larger movements or bigger
magnitude movements would be a good consideration. Some additional details on how
difficulty is calculated will be released soon but it is a combination of the
magnitude of the movement with the std dev of the movement in the predicted interval.
-- Get more granular data (1m)
With these larger magnitude movements, a strategy to get more granular with the data
would be recommended. Using 1m data to train rather than 5m would help the models
better predict information.
-- Get more data sources
Beyond using financial market indicators like RSI, MACD, and Bollinger Bands, the
TSPS open source infra will gather information for miners to help train.
The TSPS infrastructure will be adding data scrapers and using those data scrapers
to automatically gather information for you. The following pieces of information will
be gathered & accessible through the open source infra:
- Bitcoin open interest
- Bitcoin OHLCV data
- Bitcoin funding rate
- DXY OHLCV data
- Gold OHLCV data
- S&P 500 OHLCV data
- Bitcoin dominance
- Historical news data (sentiment analysis)
Using this information will provide models with information they can use to better
predict prices as markets correlate in movement and Bitcoin responds to other markets.
-- Use more predicted steps
Rather than only predicting a single step at the 100th predicted close in the future,
predict more steps. This can be achieved by training multiple models, for example,
10 models each at 10 closes into the future (10, 20, 30, 40, 50, 60, 70, 80, 90, 100),
or by using a multi-step model with 10 steps. Both will achieve more granularity when
it comes to predictions and therefore can achieve a much greater RMSE score.