---
Copyright (c) 2023 Taoshi Inc

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
---

Background

The models provided here were created using open source modeling techniques 
provided in https://github.com/taoshidev/time-series-prediction-subnet (TSPS). 
They were achieved using the runnable/miner_training.py, and tested against 
existing models and dummy models in runnable/miner_testing.py.

Build Strategy

This section outlines the strategy used to build the models.

Understanding Dataset Used

The dataset used to build the models can be generated using the 
runnable/generate_historical_data.py. A lookback period between June 2022 and 
July 2023 on the 5m interval was used to train the model. Through analysis, the 
reason this dataset was used is because historical data beyond June 2022 provides 
strongly trending price movement or data movement that is from a period where 
Bitcoin's market cap was too small to be relevant to where Bitcoin is now.

Therefore, using more recent data was used which correlates to the current market 
cap and macroeconomic conditions where its uncertain we'll continue to get highly 
trending Bitcoin data.

Testing data was used between June 2023 and Nov 2023 to determine performance of 
the models. This was tested using the runnable/miner_testing.py file with a 
separately generated test dataset from runnable/generate_historical_data.py.

Understanding Model Creation

As of now, the TSPS infrastructure only provides close, high, low, and volume. It 
also provides financial indicators such as RSI, MACD, and Bollinger Bands but they 
were not used for the purposes of training these models.

The models were derived using a variety of windows and iterations through the June 
2022 to June 2023 dataset. The strategy to derive the model was the following:

base_mining_model = BaseMiningModel(len(prep_dataset.T)) \
                            .set_neurons([[1024, 0]]) \
                            .set_window_size(100) \
                            .set_learning_rate(0.0000001) \
                            .set_batch_size(500) \
                            .set_model_dir(f'mining_models/model1.h5')
                        base_mining_model.train(prep_dataset, epochs=25)

where an LSTM model was created by using a few or no stacked layers. Most of the 
v4 models are actually not stacked as they performed better not being stacked for 
the most part. This could very likely change as more feature inputs are added (this 
is being worked on as part of the open source infra in TSPS). The window size of 
100 helped best predict the outcome, derived in mining_objects/base_mining_model.py

Understanding Training Decisions

Training the model used the previous 601 rows of data as an input. This is because 
500 rows were used to batch, and we are looking to predict 100 rows into the future 
(the challenge presented in the Time Series Prediction Subnet). Measures were taken 
to ensure all data was trained on in the training data.

Each set of 601 rows was trained on 25 times, inside another loop which iterated on 
the entirety of the dataset from 6/22 to 6/23 50 times. This provided the model the 
ability to get granular with details yet not overfit to any single set of rows at 
once. Therefore, a multi-layered looping infrastructure was used to derive the models.

for x in range(50):
    for i in range(25):
        train_model()

Strategy to Predict

The strategy to predict 100 closes of data into the future was to use a 1 step 
methodology of predicting 1 step at 100 intervals into the future and connect the 
information by generating a line from the last close to the prediction 100 closes 
into the future. By doing so, the model could learn to predict a single step rather 
than all 100 where loss could continue to increase with each misstep.

Model V5

Here's the text spaced out for readability in a README file:

Recommendations on how to perform better than V4 and what Model V5 will look like 
are outlined below:

1. Concentrate on more difficult moves
2. Get more granular data (1m)
3. Get more data sources
4. Use more predicted steps

-- Concentrate on more difficult moves

The Time Series Prediction Subnet will reward models that are capable of predicting 
more "difficult" movements in the market more than those that are less difficult. 
Therefore, taking a strategy to train your model on larger movements or bigger 
magnitude movements would be a good consideration. Some additional details on how 
difficulty is calculated will be released soon but it is a combination of the 
magnitude of the movement with the std dev of the movement in the predicted interval.

-- Get more granular data (1m)

With these larger magnitude movements, a strategy to get more granular with the data 
would be recommended. Using 1m data to train rather than 5m would help the models 
better predict information.

-- Get more data sources

Beyond using financial market indicators like RSI, MACD, and Bollinger Bands, the 
TSPS open source infra will gather information for miners to help train.

The TSPS infrastructure will be adding data scrapers and using those data scrapers 
to automatically gather information for you. The following pieces of information will 
be gathered & accessible through the open source infra:

- Bitcoin open interest
- Bitcoin OHLCV data
- Bitcoin funding rate
- DXY OHLCV data
- Gold OHLCV data
- S&P 500 OHLCV data
- Bitcoin dominance
- Historical news data (sentiment analysis)

Using this information will provide models with information they can use to better 
predict prices as markets correlate in movement and Bitcoin responds to other markets.

-- Use more predicted steps

Rather than only predicting a single step at the 100th predicted close in the future, 
predict more steps. This can be achieved by training multiple models, for example, 
10 models each at 10 closes into the future (10, 20, 30, 40, 50, 60, 70, 80, 90, 100), 
or by using a multi-step model with 10 steps. Both will achieve more granularity when 
it comes to predictions and therefore can achieve a much greater RMSE score.