Migrate model card from transformers-repo
Browse filesRead announcement at https://discuss.huggingface.co/t/announcement-all-model-cards-will-be-migrated-to-hf-co-model-repos/2755
Original file history: https://github.com/huggingface/transformers/commits/master/model_cards/google/bert_uncased_L-4_H-256_A-4/README.md
README.md
ADDED
@@ -0,0 +1,76 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
thumbnail: https://huggingface.co/front/thumbnails/google.png
|
3 |
+
|
4 |
+
license: apache-2.0
|
5 |
+
---
|
6 |
+
|
7 |
+
BERT Miniatures
|
8 |
+
===
|
9 |
+
|
10 |
+
This is the set of 24 BERT models referenced in [Well-Read Students Learn Better: On the Importance of Pre-training Compact Models](https://arxiv.org/abs/1908.08962) (English only, uncased, trained with WordPiece masking).
|
11 |
+
|
12 |
+
We have shown that the standard BERT recipe (including model architecture and training objective) is effective on a wide range of model sizes, beyond BERT-Base and BERT-Large. The smaller BERT models are intended for environments with restricted computational resources. They can be fine-tuned in the same manner as the original BERT models. However, they are most effective in the context of knowledge distillation, where the fine-tuning labels are produced by a larger and more accurate teacher.
|
13 |
+
|
14 |
+
Our goal is to enable research in institutions with fewer computational resources and encourage the community to seek directions of innovation alternative to increasing model capacity.
|
15 |
+
|
16 |
+
You can download the 24 BERT miniatures either from the [official BERT Github page](https://github.com/google-research/bert/), or via HuggingFace from the links below:
|
17 |
+
|
18 |
+
| |H=128|H=256|H=512|H=768|
|
19 |
+
|---|:---:|:---:|:---:|:---:|
|
20 |
+
| **L=2** |[**2/128 (BERT-Tiny)**][2_128]|[2/256][2_256]|[2/512][2_512]|[2/768][2_768]|
|
21 |
+
| **L=4** |[4/128][4_128]|[**4/256 (BERT-Mini)**][4_256]|[**4/512 (BERT-Small)**][4_512]|[4/768][4_768]|
|
22 |
+
| **L=6** |[6/128][6_128]|[6/256][6_256]|[6/512][6_512]|[6/768][6_768]|
|
23 |
+
| **L=8** |[8/128][8_128]|[8/256][8_256]|[**8/512 (BERT-Medium)**][8_512]|[8/768][8_768]|
|
24 |
+
| **L=10** |[10/128][10_128]|[10/256][10_256]|[10/512][10_512]|[10/768][10_768]|
|
25 |
+
| **L=12** |[12/128][12_128]|[12/256][12_256]|[12/512][12_512]|[**12/768 (BERT-Base)**][12_768]|
|
26 |
+
|
27 |
+
Note that the BERT-Base model in this release is included for completeness only; it was re-trained under the same regime as the original model.
|
28 |
+
|
29 |
+
Here are the corresponding GLUE scores on the test set:
|
30 |
+
|
31 |
+
|Model|Score|CoLA|SST-2|MRPC|STS-B|QQP|MNLI-m|MNLI-mm|QNLI(v2)|RTE|WNLI|AX|
|
32 |
+
|---|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|
|
33 |
+
|BERT-Tiny|64.2|0.0|83.2|81.1/71.1|74.3/73.6|62.2/83.4|70.2|70.3|81.5|57.2|62.3|21.0|
|
34 |
+
|BERT-Mini|65.8|0.0|85.9|81.1/71.8|75.4/73.3|66.4/86.2|74.8|74.3|84.1|57.9|62.3|26.1|
|
35 |
+
|BERT-Small|71.2|27.8|89.7|83.4/76.2|78.8/77.0|68.1/87.0|77.6|77.0|86.4|61.8|62.3|28.6|
|
36 |
+
|BERT-Medium|73.5|38.0|89.6|86.6/81.6|80.4/78.4|69.6/87.9|80.0|79.1|87.7|62.2|62.3|30.5|
|
37 |
+
|
38 |
+
For each task, we selected the best fine-tuning hyperparameters from the lists below, and trained for 4 epochs:
|
39 |
+
- batch sizes: 8, 16, 32, 64, 128
|
40 |
+
- learning rates: 3e-4, 1e-4, 5e-5, 3e-5
|
41 |
+
|
42 |
+
If you use these models, please cite the following paper:
|
43 |
+
|
44 |
+
```
|
45 |
+
@article{turc2019,
|
46 |
+
title={Well-Read Students Learn Better: On the Importance of Pre-training Compact Models},
|
47 |
+
author={Turc, Iulia and Chang, Ming-Wei and Lee, Kenton and Toutanova, Kristina},
|
48 |
+
journal={arXiv preprint arXiv:1908.08962v2 },
|
49 |
+
year={2019}
|
50 |
+
}
|
51 |
+
```
|
52 |
+
|
53 |
+
[2_128]: https://huggingface.co/google/bert_uncased_L-2_H-128_A-2
|
54 |
+
[2_256]: https://huggingface.co/google/bert_uncased_L-2_H-256_A-4
|
55 |
+
[2_512]: https://huggingface.co/google/bert_uncased_L-2_H-512_A-8
|
56 |
+
[2_768]: https://huggingface.co/google/bert_uncased_L-2_H-768_A-12
|
57 |
+
[4_128]: https://huggingface.co/google/bert_uncased_L-4_H-128_A-2
|
58 |
+
[4_256]: https://huggingface.co/google/bert_uncased_L-4_H-256_A-4
|
59 |
+
[4_512]: https://huggingface.co/google/bert_uncased_L-4_H-512_A-8
|
60 |
+
[4_768]: https://huggingface.co/google/bert_uncased_L-4_H-768_A-12
|
61 |
+
[6_128]: https://huggingface.co/google/bert_uncased_L-6_H-128_A-2
|
62 |
+
[6_256]: https://huggingface.co/google/bert_uncased_L-6_H-256_A-4
|
63 |
+
[6_512]: https://huggingface.co/google/bert_uncased_L-6_H-512_A-8
|
64 |
+
[6_768]: https://huggingface.co/google/bert_uncased_L-6_H-768_A-12
|
65 |
+
[8_128]: https://huggingface.co/google/bert_uncased_L-8_H-128_A-2
|
66 |
+
[8_256]: https://huggingface.co/google/bert_uncased_L-8_H-256_A-4
|
67 |
+
[8_512]: https://huggingface.co/google/bert_uncased_L-8_H-512_A-8
|
68 |
+
[8_768]: https://huggingface.co/google/bert_uncased_L-8_H-768_A-12
|
69 |
+
[10_128]: https://huggingface.co/google/bert_uncased_L-10_H-128_A-2
|
70 |
+
[10_256]: https://huggingface.co/google/bert_uncased_L-10_H-256_A-4
|
71 |
+
[10_512]: https://huggingface.co/google/bert_uncased_L-10_H-512_A-8
|
72 |
+
[10_768]: https://huggingface.co/google/bert_uncased_L-10_H-768_A-12
|
73 |
+
[12_128]: https://huggingface.co/google/bert_uncased_L-12_H-128_A-2
|
74 |
+
[12_256]: https://huggingface.co/google/bert_uncased_L-12_H-256_A-4
|
75 |
+
[12_512]: https://huggingface.co/google/bert_uncased_L-12_H-512_A-8
|
76 |
+
[12_768]: https://huggingface.co/google/bert_uncased_L-12_H-768_A-12
|