Quantized-distilbert-banking77
This model is a statically quantized version of optimum/distilbert-base-uncased-finetuned-banking77 on the banking77
dataset.
The model was created using the optimum-static-quantization notebook.
It achieves the following results on the evaluation set:
Accuracy
- Vanilla model: 92.5%
- Quantized model: 92.24%
The quantized model achieves 99.72% accuracy of the fp32 model
Latency
Payload sequence length: 128
Instance type: AWS c6i.xlarge
latency | vanilla transformers | quantized optimum model | improvement |
---|---|---|---|
p95 | 75.69ms | 26.75ms | 2.83x |
avg | 57.52ms | 24.86ms | 2.31x |
How to use
from optimum.onnxruntime import ORTModelForSequenceClassification
from transformers import pipeline, AutoTokenizer
model = ORTModelForSequenceClassification.from_pretrained("philschmid/quantized-distilbert-banking77")
tokenizer = AutoTokenizer.from_pretrained("philschmid/quantized-distilbert-banking77")
remote_clx = pipeline("text-classification",model=model, tokenizer=tokenizer)
remote_clx("What is the exchange rate like on this app?")
- Downloads last month
- 16
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.