--- datasets: - HuggingFaceFW/fineweb base_model: - openai-community/gpt2 --- # NanoGPT Speedrun Following https://github.com/KellerJordan/modded-nanogpt for fun (learning). ## Run Info **baseline/** - Run on lightning cloud, using one L40S - Batch size set to 32 - VRAM usage: 26.95GB (25698MB reported in `nvidia-smi`) - 4 seconds per step, total 3200 steps - Checkpoint saved every 320 steps ## Training loss To experimentally check the neural scaling law: ![baseline/analysis/loss_plot2.png](baseline/analysis/loss_plot2.png) (Fitted line: `log y = -0.11 * log x + 0.9` where x is step (0 to 3200) and y is the training loss) ## Demo Available at https://huggingface.co/spaces/lemonteaa/nanogpt-speedrun-demo (WIP)