Holding the GPT-2 models for the curriculum training experiment, based off the paper https://arxiv.org/pdf/2310.09518.pdf
-