gla-1B-100B / README.md

bailin28

Update README.md

85e27ce verified 9 months ago

preview code

raw

history blame contribute delete

392 Bytes

metadata

license: mit
datasets:
  - cerebras/SlimPajama-627B
language:
  - en

This checkpoint of the 1.3B GLA model used in the paper Gated Linear Attention. The model is trained with 100B tokens from the SlimPajama dataset tokenized with Llama2 tokenizer.

See the model and loading script in this repo.