Intel
/

dynamic-minilmv2-L6-H384-squad1.1-int8-static

@@ -1,3 +1,6 @@
 # Model Details: QuaLA-MiniLM
 The article discusses the challenge of making transformer-based models efficient enough for practical use, given their size and computational requirements. The authors propose a new approach called **QuaLA-MiniLM**, which combines knowledge distillation, the length-adaptive transformer (LAT) technique, and low-bit quantization. We expand the Dynamic-TinyBERT approach. This approach trains a single model that can adapt to any inference scenario with a given computational budget, achieving a superior accuracy-efficiency trade-off on the SQuAD1.1 dataset. The authors compare their approach to other efficient methods and find that it achieves up to an **x8.8 speedup with less than 1% accuracy loss**. They also provide their code publicly on GitHub. The article also discusses other related work in the field, including dynamic transformers and other knowledge distillation approaches.
@@ -5,7 +8,7 @@ The model card has been written in combination by Intel.
 ### QuaLA-MiniLM training process
 Figure showing QuaLA-MiniLM training process. To run the model with the best accuracy-efficiency tradeoff per a specific computational budget, we set the length configuration to the best setting found by an evolutionary search to match our computational constraint.
-![ArchitecureQuaLA-MiniLM.jpg](ArchitecureQuaLA-MiniLM.jpg)
 ### Model license
 Licensed under MIT license.
@@ -38,7 +41,6 @@ import ...
 ```
-For more code examples, refer to the GitHub Repo.
 ### Metrics (Model Performance):
@@ -86,4 +88,4 @@ Users (both direct and downstream) should be made aware of the risks, biases and
 | comments: | In this version we added reference to the source code in the abstract. arXiv admin note: text overlap with arXiv:2111.09645 |
 | Subjects: | Computation and Language (cs.CL) |
 | Cite as: | arXiv:2210.17114 [cs.CL]|
-| - | (or arXiv:2210.17114v2 [cs.CL] for this version) https://doi.org/10.48550/arXiv.2210.17114|

+---
+license: mit
+---
 # Model Details: QuaLA-MiniLM
 The article discusses the challenge of making transformer-based models efficient enough for practical use, given their size and computational requirements. The authors propose a new approach called **QuaLA-MiniLM**, which combines knowledge distillation, the length-adaptive transformer (LAT) technique, and low-bit quantization. We expand the Dynamic-TinyBERT approach. This approach trains a single model that can adapt to any inference scenario with a given computational budget, achieving a superior accuracy-efficiency trade-off on the SQuAD1.1 dataset. The authors compare their approach to other efficient methods and find that it achieves up to an **x8.8 speedup with less than 1% accuracy loss**. They also provide their code publicly on GitHub. The article also discusses other related work in the field, including dynamic transformers and other knowledge distillation approaches.
 ### QuaLA-MiniLM training process
 Figure showing QuaLA-MiniLM training process. To run the model with the best accuracy-efficiency tradeoff per a specific computational budget, we set the length configuration to the best setting found by an evolutionary search to match our computational constraint.
+[ArchitecureQuaLA-MiniLM.jpg](ArchitecureQuaLA-MiniLM.jpg)
 ### Model license
 Licensed under MIT license.
 ```
 ### Metrics (Model Performance):
 | comments: | In this version we added reference to the source code in the abstract. arXiv admin note: text overlap with arXiv:2111.09645 |
 | Subjects: | Computation and Language (cs.CL) |
 | Cite as: | arXiv:2210.17114 [cs.CL]|
+| - | (or arXiv:2210.17114v2 [cs.CL] for this version) https://doi.org/10.48550/arXiv.2210.17114|