jacobfulano
commited on
Commit
•
fcc434c
1
Parent(s):
ba7abb1
Update README.md
Browse files
README.md
CHANGED
@@ -158,6 +158,21 @@ When fine-tuned on downstream tasks (following the [finetuning details here](htt
|
|
158 |
|
159 |
Note that this is averaged over n=5 pretraining seeds.
|
160 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
161 |
## Intended uses & limitations
|
162 |
|
163 |
This model is intended to be finetuned on downstream tasks.
|
|
|
158 |
|
159 |
Note that this is averaged over n=5 pretraining seeds.
|
160 |
|
161 |
+
## Collection of MosaicBERT-Base models trained using ALiBi on different sequence lengths
|
162 |
+
|
163 |
+
ALiBi allows a model trained with a sequence length n to easily extrapolate to sequence lengths >2n during finetuning. For more details, see [Train Short, Test Long: Attention with Linear
|
164 |
+
Biases Enables Input Length Extrapolation (Press et al. 2022)](https://arxiv.org/abs/2108.12409)
|
165 |
+
|
166 |
+
This model is part of the **family of MosaicBERT-Base models** trained using ALiBi on different sequence lengths:
|
167 |
+
|
168 |
+
* mosaic-bert-base (trained on a sequence length of 128 tokens)
|
169 |
+
* [mosaic-bert-base-seqlen-256](https://huggingface.co/mosaicml/mosaic-bert-base-seqlen-256)
|
170 |
+
* [mosaic-bert-base-seqlen-512](https://huggingface.co/mosaicml/mosaic-bert-base-seqlen-512)
|
171 |
+
* [mosaic-bert-base-seqlen-1024](https://huggingface.co/mosaicml/mosaic-bert-base-seqlen-1024)
|
172 |
+
* [mosaic-bert-base-seqlen-2048](https://huggingface.co/mosaicml/mosaic-bert-base-seqlen-2048)
|
173 |
+
|
174 |
+
The primary use case of these models is for research on efficient pretraining and finetuning for long context embeddings.
|
175 |
+
|
176 |
## Intended uses & limitations
|
177 |
|
178 |
This model is intended to be finetuned on downstream tasks.
|