Edit model card

A modified GPT2 architecture with 25m non-embedding parameters, no biases, embedding-ln, scaled sin position embeddings, and a modification that makes the model's transformer run over the sequence four times before going to the language modelling head.

model avg arc hellaswag mmlu truthfulqa
horizon-25m-v0 30.625 20.22 26.23 25.9 50.15
cramp-25m 30.57 21.76 27.35 25.53 47.66
gpt2 30.06 22.1 31.6 25.86 40.67
pythia 70m deduped 30.25 21.08 27.17 25.26 47.51
pythia 70m 30.46 21.59 27.29 25.9 47.06
pythia 160m deduped 31.16 24.06 30.34 24.95 44.34
pythia 160m 30.58 22.78 30.34 24.95 44.26

Dataset (Horizon-v0)

Source Documents
arxiv 8.78k
github 8.82k
books 10k
wiki 14.67k
openwebtext v2 30.73k
Downloads last month
14
Inference Examples
Inference API (serverless) does not yet support model repos that contain custom code.