Post A 1b dense causal language model begins to "saturate" in terms of accuracy around 5 epochs on 1.2T tokens. 👍 2 2 +