mosaicml
/

mosaic-bert-base

Model card Files Files and versions Community

jacobfulano commited on May 2, 2023

Commit

fcc434c

•

1 Parent(s): ba7abb1

Update README.md

Files changed (1) hide show

README.md +15 -0

README.md CHANGED Viewed

@@ -158,6 +158,21 @@ When fine-tuned on downstream tasks (following the [finetuning details here](htt
 Note that this is averaged over n=5 pretraining seeds.
 ## Intended uses & limitations
 This model is intended to be finetuned on downstream tasks.

 Note that this is averaged over n=5 pretraining seeds.
+## Collection of MosaicBERT-Base models trained using ALiBi on different sequence lengths
+ALiBi allows a model trained with a sequence length n to easily extrapolate to sequence lengths >2n during finetuning. For more details, see [Train Short, Test Long: Attention with Linear
+Biases Enables Input Length Extrapolation (Press et al. 2022)](https://arxiv.org/abs/2108.12409)
+This model is part of the **family of MosaicBERT-Base models** trained using ALiBi on different sequence lengths:
+* mosaic-bert-base (trained on a sequence length of 128 tokens)
+* [mosaic-bert-base-seqlen-256](https://huggingface.co/mosaicml/mosaic-bert-base-seqlen-256)
+* [mosaic-bert-base-seqlen-512](https://huggingface.co/mosaicml/mosaic-bert-base-seqlen-512)
+* [mosaic-bert-base-seqlen-1024](https://huggingface.co/mosaicml/mosaic-bert-base-seqlen-1024)
+* [mosaic-bert-base-seqlen-2048](https://huggingface.co/mosaicml/mosaic-bert-base-seqlen-2048)
+The primary use case of these models is for research on efficient pretraining and finetuning for long context embeddings.
 ## Intended uses & limitations
 This model is intended to be finetuned on downstream tasks.