Files changed (1) hide show
  1. README.md +5 -3
README.md CHANGED
@@ -1,3 +1,6 @@
 
 
 
1
  # Model Details: QuaLA-MiniLM
2
  The article discusses the challenge of making transformer-based models efficient enough for practical use, given their size and computational requirements. The authors propose a new approach called **QuaLA-MiniLM**, which combines knowledge distillation, the length-adaptive transformer (LAT) technique, and low-bit quantization. We expand the Dynamic-TinyBERT approach. This approach trains a single model that can adapt to any inference scenario with a given computational budget, achieving a superior accuracy-efficiency trade-off on the SQuAD1.1 dataset. The authors compare their approach to other efficient methods and find that it achieves up to an **x8.8 speedup with less than 1% accuracy loss**. They also provide their code publicly on GitHub. The article also discusses other related work in the field, including dynamic transformers and other knowledge distillation approaches.
3
 
@@ -5,7 +8,7 @@ The model card has been written in combination by Intel.
5
 
6
  ### QuaLA-MiniLM training process
7
  Figure showing QuaLA-MiniLM training process. To run the model with the best accuracy-efficiency tradeoff per a specific computational budget, we set the length configuration to the best setting found by an evolutionary search to match our computational constraint.
8
- ![ArchitecureQuaLA-MiniLM.jpg](ArchitecureQuaLA-MiniLM.jpg)
9
 
10
  ### Model license
11
  Licensed under MIT license.
@@ -38,7 +41,6 @@ import ...
38
 
39
  ```
40
 
41
- For more code examples, refer to the GitHub Repo.
42
 
43
  ### Metrics (Model Performance):
44
 
@@ -86,4 +88,4 @@ Users (both direct and downstream) should be made aware of the risks, biases and
86
  | comments: | In this version we added reference to the source code in the abstract. arXiv admin note: text overlap with arXiv:2111.09645 |
87
  | Subjects: | Computation and Language (cs.CL) |
88
  | Cite as: | arXiv:2210.17114 [cs.CL]|
89
- | - | (or arXiv:2210.17114v2 [cs.CL] for this version) https://doi.org/10.48550/arXiv.2210.17114|
 
1
+ ---
2
+ license: mit
3
+ ---
4
  # Model Details: QuaLA-MiniLM
5
  The article discusses the challenge of making transformer-based models efficient enough for practical use, given their size and computational requirements. The authors propose a new approach called **QuaLA-MiniLM**, which combines knowledge distillation, the length-adaptive transformer (LAT) technique, and low-bit quantization. We expand the Dynamic-TinyBERT approach. This approach trains a single model that can adapt to any inference scenario with a given computational budget, achieving a superior accuracy-efficiency trade-off on the SQuAD1.1 dataset. The authors compare their approach to other efficient methods and find that it achieves up to an **x8.8 speedup with less than 1% accuracy loss**. They also provide their code publicly on GitHub. The article also discusses other related work in the field, including dynamic transformers and other knowledge distillation approaches.
6
 
 
8
 
9
  ### QuaLA-MiniLM training process
10
  Figure showing QuaLA-MiniLM training process. To run the model with the best accuracy-efficiency tradeoff per a specific computational budget, we set the length configuration to the best setting found by an evolutionary search to match our computational constraint.
11
+ [ArchitecureQuaLA-MiniLM.jpg](ArchitecureQuaLA-MiniLM.jpg)
12
 
13
  ### Model license
14
  Licensed under MIT license.
 
41
 
42
  ```
43
 
 
44
 
45
  ### Metrics (Model Performance):
46
 
 
88
  | comments: | In this version we added reference to the source code in the abstract. arXiv admin note: text overlap with arXiv:2111.09645 |
89
  | Subjects: | Computation and Language (cs.CL) |
90
  | Cite as: | arXiv:2210.17114 [cs.CL]|
91
+ | - | (or arXiv:2210.17114v2 [cs.CL] for this version) https://doi.org/10.48550/arXiv.2210.17114|