jacobfulano commited on
Commit
fcc434c
1 Parent(s): ba7abb1

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +15 -0
README.md CHANGED
@@ -158,6 +158,21 @@ When fine-tuned on downstream tasks (following the [finetuning details here](htt
158
 
159
  Note that this is averaged over n=5 pretraining seeds.
160
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
161
  ## Intended uses & limitations
162
 
163
  This model is intended to be finetuned on downstream tasks.
 
158
 
159
  Note that this is averaged over n=5 pretraining seeds.
160
 
161
+ ## Collection of MosaicBERT-Base models trained using ALiBi on different sequence lengths
162
+
163
+ ALiBi allows a model trained with a sequence length n to easily extrapolate to sequence lengths >2n during finetuning. For more details, see [Train Short, Test Long: Attention with Linear
164
+ Biases Enables Input Length Extrapolation (Press et al. 2022)](https://arxiv.org/abs/2108.12409)
165
+
166
+ This model is part of the **family of MosaicBERT-Base models** trained using ALiBi on different sequence lengths:
167
+
168
+ * mosaic-bert-base (trained on a sequence length of 128 tokens)
169
+ * [mosaic-bert-base-seqlen-256](https://huggingface.co/mosaicml/mosaic-bert-base-seqlen-256)
170
+ * [mosaic-bert-base-seqlen-512](https://huggingface.co/mosaicml/mosaic-bert-base-seqlen-512)
171
+ * [mosaic-bert-base-seqlen-1024](https://huggingface.co/mosaicml/mosaic-bert-base-seqlen-1024)
172
+ * [mosaic-bert-base-seqlen-2048](https://huggingface.co/mosaicml/mosaic-bert-base-seqlen-2048)
173
+
174
+ The primary use case of these models is for research on efficient pretraining and finetuning for long context embeddings.
175
+
176
  ## Intended uses & limitations
177
 
178
  This model is intended to be finetuned on downstream tasks.