Update README.md
#6
by
zmadscientist
- opened
README.md
CHANGED
@@ -1,3 +1,6 @@
|
|
|
|
|
|
|
|
1 |
# Model Details: QuaLA-MiniLM
|
2 |
The article discusses the challenge of making transformer-based models efficient enough for practical use, given their size and computational requirements. The authors propose a new approach called **QuaLA-MiniLM**, which combines knowledge distillation, the length-adaptive transformer (LAT) technique, and low-bit quantization. We expand the Dynamic-TinyBERT approach. This approach trains a single model that can adapt to any inference scenario with a given computational budget, achieving a superior accuracy-efficiency trade-off on the SQuAD1.1 dataset. The authors compare their approach to other efficient methods and find that it achieves up to an **x8.8 speedup with less than 1% accuracy loss**. They also provide their code publicly on GitHub. The article also discusses other related work in the field, including dynamic transformers and other knowledge distillation approaches.
|
3 |
|
@@ -5,7 +8,7 @@ The model card has been written in combination by Intel.
|
|
5 |
|
6 |
### QuaLA-MiniLM training process
|
7 |
Figure showing QuaLA-MiniLM training process. To run the model with the best accuracy-efficiency tradeoff per a specific computational budget, we set the length configuration to the best setting found by an evolutionary search to match our computational constraint.
|
8 |
-
|
9 |
|
10 |
### Model license
|
11 |
Licensed under MIT license.
|
@@ -38,7 +41,6 @@ import ...
|
|
38 |
|
39 |
```
|
40 |
|
41 |
-
For more code examples, refer to the GitHub Repo.
|
42 |
|
43 |
### Metrics (Model Performance):
|
44 |
|
@@ -86,4 +88,4 @@ Users (both direct and downstream) should be made aware of the risks, biases and
|
|
86 |
| comments: | In this version we added reference to the source code in the abstract. arXiv admin note: text overlap with arXiv:2111.09645 |
|
87 |
| Subjects: | Computation and Language (cs.CL) |
|
88 |
| Cite as: | arXiv:2210.17114 [cs.CL]|
|
89 |
-
| - | (or arXiv:2210.17114v2 [cs.CL] for this version) https://doi.org/10.48550/arXiv.2210.17114|
|
|
|
1 |
+
---
|
2 |
+
license: mit
|
3 |
+
---
|
4 |
# Model Details: QuaLA-MiniLM
|
5 |
The article discusses the challenge of making transformer-based models efficient enough for practical use, given their size and computational requirements. The authors propose a new approach called **QuaLA-MiniLM**, which combines knowledge distillation, the length-adaptive transformer (LAT) technique, and low-bit quantization. We expand the Dynamic-TinyBERT approach. This approach trains a single model that can adapt to any inference scenario with a given computational budget, achieving a superior accuracy-efficiency trade-off on the SQuAD1.1 dataset. The authors compare their approach to other efficient methods and find that it achieves up to an **x8.8 speedup with less than 1% accuracy loss**. They also provide their code publicly on GitHub. The article also discusses other related work in the field, including dynamic transformers and other knowledge distillation approaches.
|
6 |
|
|
|
8 |
|
9 |
### QuaLA-MiniLM training process
|
10 |
Figure showing QuaLA-MiniLM training process. To run the model with the best accuracy-efficiency tradeoff per a specific computational budget, we set the length configuration to the best setting found by an evolutionary search to match our computational constraint.
|
11 |
+
[ArchitecureQuaLA-MiniLM.jpg](ArchitecureQuaLA-MiniLM.jpg)
|
12 |
|
13 |
### Model license
|
14 |
Licensed under MIT license.
|
|
|
41 |
|
42 |
```
|
43 |
|
|
|
44 |
|
45 |
### Metrics (Model Performance):
|
46 |
|
|
|
88 |
| comments: | In this version we added reference to the source code in the abstract. arXiv admin note: text overlap with arXiv:2111.09645 |
|
89 |
| Subjects: | Computation and Language (cs.CL) |
|
90 |
| Cite as: | arXiv:2210.17114 [cs.CL]|
|
91 |
+
| - | (or arXiv:2210.17114v2 [cs.CL] for this version) https://doi.org/10.48550/arXiv.2210.17114|
|