|
--- |
|
language: |
|
- en |
|
tags: |
|
- generated_from_trainer |
|
datasets: |
|
- glue |
|
metrics: |
|
- accuracy |
|
model-index: |
|
- name: yujiepan/bert-base-uncased-sst2-int8-unstructured80-30epoch |
|
results: |
|
- task: |
|
name: Text Classification |
|
type: text-classification |
|
dataset: |
|
name: GLUE SST2 |
|
type: glue |
|
config: sst2 |
|
split: validation |
|
args: sst2 |
|
metrics: |
|
- name: Accuracy |
|
type: accuracy |
|
value: 0.9139908256880734 |
|
--- |
|
|
|
<!-- This model card has been generated automatically according to the information the Trainer had access to. You |
|
should probably proofread and complete it, then remove this comment. --> |
|
|
|
# Joint magnitude pruning, quantization and distillation on BERT-base/SST-2 |
|
|
|
This model conducts unstructured magnitude pruning, quantization and distillation at the same time when finetuning on the GLUE SST2 dataset. |
|
It achieves the following results on the evaluation set: |
|
- Torch loss: 0.4116 |
|
- Torch accuracy: 0.9140 |
|
- OpenVINO IR accuracy: 0.9106 |
|
- Sparsity in transformer block linear layers: 0.80 |
|
|
|
## Setup |
|
|
|
``` |
|
conda install pytorch torchvision torchaudio pytorch-cuda=11.6 -c pytorch -c nvidia |
|
git clone https://github.com/yujiepan-work/optimum-intel.git |
|
git checkout -b "magnitude-pruning" 01927af543eaea8678671bf8f4eb78fdb29f8930 |
|
cd optimum-intel |
|
pip install -e .[openvino,nncf] |
|
|
|
cd examples/openvino/text-classification/ |
|
pip install -r requirements.txt |
|
pip install wandb # optional |
|
``` |
|
|
|
## NNCF config |
|
|
|
See `nncf_config.json` in this repo. |
|
|
|
|
|
## Run |
|
|
|
We use one card for training. |
|
|
|
``` |
|
NNCFCFG=/path/to/nncf/config |
|
python run_glue.py \ |
|
--lr_scheduler_type cosine_with_restarts \ |
|
--cosine_cycle_ratios 8,6,4,4,4,4 \ |
|
--cosine_cycle_decays 1,1,1,1,1,1 \ |
|
--save_best_model_after_epoch -1 \ |
|
--save_best_model_after_sparsity 0.7999 \ |
|
--model_name_or_path textattack/bert-base-uncased-SST-2 \ |
|
--teacher_model_or_path yoshitomo-matsubara/bert-large-uncased-sst2 \ |
|
--distillation_temperature 2 \ |
|
--task_name sst2 \ |
|
--nncf_compression_config $NNCFCFG \ |
|
--distillation_weight 0.95 \ |
|
--output_dir /tmp/bert-base-uncased-sst2-int8-unstructured80-30epoch \ |
|
--run_name bert-base-uncased-sst2-int8-unstructured80-30epoch \ |
|
--overwrite_output_dir \ |
|
--do_train \ |
|
--do_eval \ |
|
--max_seq_length 128 \ |
|
--per_device_train_batch_size 32 \ |
|
--per_device_eval_batch_size 32 \ |
|
--learning_rate 5e-05 \ |
|
--optim adamw_torch \ |
|
--num_train_epochs 30 \ |
|
--logging_steps 1 \ |
|
--evaluation_strategy steps \ |
|
--eval_steps 250 \ |
|
--save_strategy steps \ |
|
--save_steps 250 \ |
|
--save_total_limit 1 \ |
|
--fp16 \ |
|
--seed 1 |
|
``` |
|
|
|
The best model checkpoint is stored in the `best_model` folder. Here we only upload that checkpoint folder together with some config files. |
|
|
|
|
|
### Framework versions |
|
|
|
- Transformers 4.26.0 |
|
- Pytorch 1.13.1+cu116 |
|
- Datasets 2.8.0 |
|
- Tokenizers 0.13.2 |
|
|
|
For a full description of the environment, please refer to `pip-requirements.txt` and `conda-requirements.txt`. |