Migrate model card from transformers-repo
Browse filesRead announcement at https://discuss.huggingface.co/t/announcement-all-model-cards-will-be-migrated-to-hf-co-model-repos/2755
Original file history: https://github.com/huggingface/transformers/commits/master/model_cards/nyu-mll/roberta-base-1B-3/README.md
README.md
ADDED
@@ -0,0 +1,49 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# RoBERTa Pretrained on Smaller Datasets
|
2 |
+
|
3 |
+
We pretrain RoBERTa on smaller datasets (1M, 10M, 100M, 1B tokens). We release 3 models with lowest perplexities for each pretraining data size out of 25 runs (or 10 in the case of 1B tokens). The pretraining data reproduces that of BERT: We combine English Wikipedia and a reproduction of BookCorpus using texts from smashwords in a ratio of approximately 3:1.
|
4 |
+
|
5 |
+
### Hyperparameters and Validation Perplexity
|
6 |
+
|
7 |
+
The hyperparameters and validation perplexities corresponding to each model are as follows:
|
8 |
+
|
9 |
+
| Model Name | Training Size | Model Size | Max Steps | Batch Size | Validation Perplexity |
|
10 |
+
|--------------------------|---------------|------------|-----------|------------|-----------------------|
|
11 |
+
| [roberta-base-1B-1][link-roberta-base-1B-1] | 1B | BASE | 100K | 512 | 3.93 |
|
12 |
+
| [roberta-base-1B-2][link-roberta-base-1B-2] | 1B | BASE | 31K | 1024 | 4.25 |
|
13 |
+
| [roberta-base-1B-3][link-roberta-base-1B-3] | 1B | BASE | 31K | 4096 | 3.84 |
|
14 |
+
| [roberta-base-100M-1][link-roberta-base-100M-1] | 100M | BASE | 100K | 512 | 4.99 |
|
15 |
+
| [roberta-base-100M-2][link-roberta-base-100M-2] | 100M | BASE | 31K | 1024 | 4.61 |
|
16 |
+
| [roberta-base-100M-3][link-roberta-base-100M-3] | 100M | BASE | 31K | 512 | 5.02 |
|
17 |
+
| [roberta-base-10M-1][link-roberta-base-10M-1] | 10M | BASE | 10K | 1024 | 11.31 |
|
18 |
+
| [roberta-base-10M-2][link-roberta-base-10M-2] | 10M | BASE | 10K | 512 | 10.78 |
|
19 |
+
| [roberta-base-10M-3][link-roberta-base-10M-3] | 10M | BASE | 31K | 512 | 11.58 |
|
20 |
+
| [roberta-med-small-1M-1][link-roberta-med-small-1M-1] | 1M | MED-SMALL | 100K | 512 | 153.38 |
|
21 |
+
| [roberta-med-small-1M-2][link-roberta-med-small-1M-2] | 1M | MED-SMALL | 10K | 512 | 134.18 |
|
22 |
+
| [roberta-med-small-1M-3][link-roberta-med-small-1M-3] | 1M | MED-SMALL | 31K | 512 | 139.39 |
|
23 |
+
|
24 |
+
The hyperparameters corresponding to model sizes mentioned above are as follows:
|
25 |
+
|
26 |
+
| Model Size | L | AH | HS | FFN | P |
|
27 |
+
|------------|----|----|-----|------|------|
|
28 |
+
| BASE | 12 | 12 | 768 | 3072 | 125M |
|
29 |
+
| MED-SMALL | 6 | 8 | 512 | 2048 | 45M |
|
30 |
+
|
31 |
+
(AH = number of attention heads; HS = hidden size; FFN = feedforward network dimension; P = number of parameters.)
|
32 |
+
|
33 |
+
For other hyperparameters, we select:
|
34 |
+
- Peak Learning rate: 5e-4
|
35 |
+
- Warmup Steps: 6% of max steps
|
36 |
+
- Dropout: 0.1
|
37 |
+
|
38 |
+
[link-roberta-med-small-1M-1]: https://huggingface.co/nyu-mll/roberta-med-small-1M-1
|
39 |
+
[link-roberta-med-small-1M-2]: https://huggingface.co/nyu-mll/roberta-med-small-1M-2
|
40 |
+
[link-roberta-med-small-1M-3]: https://huggingface.co/nyu-mll/roberta-med-small-1M-3
|
41 |
+
[link-roberta-base-10M-1]: https://huggingface.co/nyu-mll/roberta-base-10M-1
|
42 |
+
[link-roberta-base-10M-2]: https://huggingface.co/nyu-mll/roberta-base-10M-2
|
43 |
+
[link-roberta-base-10M-3]: https://huggingface.co/nyu-mll/roberta-base-10M-3
|
44 |
+
[link-roberta-base-100M-1]: https://huggingface.co/nyu-mll/roberta-base-100M-1
|
45 |
+
[link-roberta-base-100M-2]: https://huggingface.co/nyu-mll/roberta-base-100M-2
|
46 |
+
[link-roberta-base-100M-3]: https://huggingface.co/nyu-mll/roberta-base-100M-3
|
47 |
+
[link-roberta-base-1B-1]: https://huggingface.co/nyu-mll/roberta-base-1B-1
|
48 |
+
[link-roberta-base-1B-2]: https://huggingface.co/nyu-mll/roberta-base-1B-2
|
49 |
+
[link-roberta-base-1B-3]: https://huggingface.co/nyu-mll/roberta-base-1B-3
|