File size: 2,893 Bytes
389de5b 397ff9a 1d2c2d3 397ff9a 389de5b 697e64c 389de5b 532c6ec 389de5b |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 |
---
language:
- en
tags:
- generated_from_trainer
datasets:
- glue
metrics:
- accuracy
model-index:
- name: yujiepan/bert-base-uncased-sst2-int8-unstructured80-30epoch
results:
- task:
name: Text Classification
type: text-classification
dataset:
name: GLUE SST2
type: glue
config: sst2
split: validation
args: sst2
metrics:
- name: Accuracy
type: accuracy
value: 0.9139908256880734
---
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->
# Joint magnitude pruning, quantization and distillation on BERT-base/SST-2
This model conducts unstructured magnitude pruning, quantization and distillation at the same time when finetuning on the GLUE SST2 dataset.
It achieves the following results on the evaluation set:
- Torch loss: 0.4116
- Torch accuracy: 0.9140
- OpenVINO IR accuracy: 0.9106
- Sparsity in transformer block linear layers: 0.80
## Setup
```
conda install pytorch torchvision torchaudio pytorch-cuda=11.6 -c pytorch -c nvidia
git clone https://github.com/yujiepan-work/optimum-intel.git
git checkout -b "magnitude-pruning" 01927af543eaea8678671bf8f4eb78fdb29f8930
cd optimum-intel
pip install -e .[openvino,nncf]
cd examples/openvino/text-classification/
pip install -r requirements.txt
pip install wandb # optional
```
## NNCF config
See `nncf_config.json` in this repo.
## Run
We use one card for training.
```
NNCFCFG=/path/to/nncf/config
python run_glue.py \
--lr_scheduler_type cosine_with_restarts \
--cosine_cycle_ratios 8,6,4,4,4,4 \
--cosine_cycle_decays 1,1,1,1,1,1 \
--save_best_model_after_epoch -1 \
--save_best_model_after_sparsity 0.7999 \
--model_name_or_path textattack/bert-base-uncased-SST-2 \
--teacher_model_or_path yoshitomo-matsubara/bert-large-uncased-sst2 \
--distillation_temperature 2 \
--task_name sst2 \
--nncf_compression_config $NNCFCFG \
--distillation_weight 0.95 \
--output_dir /tmp/bert-base-uncased-sst2-int8-unstructured80-30epoch \
--run_name bert-base-uncased-sst2-int8-unstructured80-30epoch \
--overwrite_output_dir \
--do_train \
--do_eval \
--max_seq_length 128 \
--per_device_train_batch_size 32 \
--per_device_eval_batch_size 32 \
--learning_rate 5e-05 \
--optim adamw_torch \
--num_train_epochs 30 \
--logging_steps 1 \
--evaluation_strategy steps \
--eval_steps 250 \
--save_strategy steps \
--save_steps 250 \
--save_total_limit 1 \
--fp16 \
--seed 1
```
The best model checkpoint is stored in the `best_model` folder. Here we only upload that checkpoint folder together with some config files.
### Framework versions
- Transformers 4.26.0
- Pytorch 1.13.1+cu116
- Datasets 2.8.0
- Tokenizers 0.13.2
For a full description of the environment, please refer to `pip-requirements.txt` and `conda-requirements.txt`. |