Spaces:
Runtime error
Runtime error
File size: 5,957 Bytes
733aa30 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 |
# Fine-tuning details
For each task (GLUE and PAWS), we perform hyperparam search for each model, and report the mean and standard deviation across 5 seeds of the best model. First, get the datasets following the instructions in [RoBERTa fine-tuning README](../roberta/README.glue.md). Alternatively, you can use [huggingface datasets](https://huggingface.co/docs/datasets/) to get the task data:
```python
from datasets import load_dataset
import pandas as pd
from pathlib import Path
key2file = {
"paws": {
"loc": "paws_data",
"columns": ["id", "sentence1", "sentence2", "label"],
"train": "train.tsv",
"validation": "dev.tsv",
"test": "test.tsv"
}
}
task_data = load_dataset("paws", "labeled_final")
task_config = key2file["paws"]
save_path = Path(task_config["loc"])
save_path.mkdir(exist_ok=True, parents=True)
for key, fl in task_config.items():
if key in ["loc", "columns"]:
continue
print(f"Reading {key}")
columns = task_config["columns"]
df = pd.DataFrame(task_data[key])
print(df.columns)
df = df[columns]
print(f"Got {len(df)} records")
save_loc = save_path / fl
print(f"Saving to : {save_loc}")
df.to_csv(save_loc, sep="\t", header=None, index=None)
```
- Preprocess using RoBERTa GLUE preprocessing script, while keeping in mind the column numbers for `sentence1`, `sentence2` and `label` (which is 0,1,2 if you save the data according to the above example.)
- Then, fine-tuning is performed similarly to RoBERTa (for example, in case of RTE):
```bash
TOTAL_NUM_UPDATES=30875 # 10 epochs through RTE for bsz 16
WARMUP_UPDATES=1852 # 6 percent of the number of updates
LR=2e-05 # Peak LR for polynomial LR scheduler.
NUM_CLASSES=2
MAX_SENTENCES=16 # Batch size.
SHUFFLED_ROBERTA_PATH=/path/to/shuffled_roberta/model.pt
CUDA_VISIBLE_DEVICES=0 fairseq-train RTE-bin/ \
--restore-file $SHUFFLED_ROBERTA_PATH \
--max-positions 512 \
--batch-size $MAX_SENTENCES \
--max-tokens 4400 \
--task sentence_prediction \
--reset-optimizer --reset-dataloader --reset-meters \
--required-batch-size-multiple 1 \
--init-token 0 --separator-token 2 \
--arch roberta_large \
--criterion sentence_prediction \
--num-classes $NUM_CLASSES \
--dropout 0.1 --attention-dropout 0.1 \
--weight-decay 0.1 --optimizer adam --adam-betas "(0.9, 0.98)" --adam-eps 1e-06 \
--clip-norm 0.0 \
--lr-scheduler polynomial_decay --lr $LR --total-num-update $TOTAL_NUM_UPDATES --warmup-updates $WARMUP_UPDATES \
--fp16 --fp16-init-scale 4 --threshold-loss-scale 1 --fp16-scale-window 128 \
--max-epoch 10 \
--find-unused-parameters \
--best-checkpoint-metric accuracy --maximize-best-checkpoint-metric;
```
- `TOTAL_NUM_UPDATES` is computed based on the `--batch_size` value and the dataset size.
- `WARMUP_UPDATES` is computed as 6% of `TOTAL_NUM_UPDATES`
- Best hyperparam of `--lr` and `--batch_size` is reported below:
## `--lr`
| | name | RTE | MRPC | SST-2 | CoLA | QQP | QNLI | MNLI | PAWS |
| --: | :----------- | ----: | ----: | ----: | ----: | ----: | ----: | ----: | ----: |
| 0 | original | 2e-05 | 2e-05 | 1e-05 | 2e-05 | 1e-05 | 1e-05 | 1e-05 | 2e-05 |
| 1 | n_1 | 2e-05 | 1e-05 | 1e-05 | 1e-05 | 3e-05 | 1e-05 | 2e-05 | 2e-05 |
| 2 | n_2 | 2e-05 | 2e-05 | 1e-05 | 1e-05 | 2e-05 | 1e-05 | 1e-05 | 3e-05 |
| 3 | n_3 | 3e-05 | 1e-05 | 2e-05 | 2e-05 | 3e-05 | 1e-05 | 1e-05 | 2e-05 |
| 4 | n_4 | 3e-05 | 1e-05 | 2e-05 | 2e-05 | 2e-05 | 1e-05 | 1e-05 | 2e-05 |
| 5 | r512 | 1e-05 | 3e-05 | 2e-05 | 2e-05 | 3e-05 | 2e-05 | 3e-05 | 2e-05 |
| 6 | rand_corpus | 2e-05 | 1e-05 | 3e-05 | 1e-05 | 3e-05 | 3e-05 | 3e-05 | 2e-05 |
| 7 | rand_uniform | 2e-05 | 1e-05 | 3e-05 | 2e-05 | 3e-05 | 3e-05 | 3e-05 | 1e-05 |
| 8 | rand_init | 1e-05 | 1e-05 | 3e-05 | 1e-05 | 1e-05 | 1e-05 | 2e-05 | 1e-05 |
| 9 | no_pos | 1e-05 | 3e-05 | 2e-05 | 1e-05 | 1e-05 | 1e-05 | 1e-05 | 1e-05 |
## `--batch_size`
| | name | RTE | MRPC | SST-2 | CoLA | QQP | QNLI | MNLI | PAWS |
| --: | :----------- | --: | ---: | ----: | ---: | --: | ---: | ---: | ---: |
| 0 | orig | 16 | 16 | 32 | 16 | 16 | 32 | 32 | 16 |
| 1 | n_1 | 32 | 32 | 16 | 32 | 32 | 16 | 32 | 16 |
| 2 | n_2 | 32 | 16 | 32 | 16 | 32 | 32 | 16 | 32 |
| 3 | n_3 | 32 | 32 | 16 | 32 | 32 | 16 | 32 | 32 |
| 4 | n_4 | 32 | 16 | 32 | 16 | 32 | 32 | 32 | 32 |
| 5 | r512 | 32 | 16 | 16 | 32 | 32 | 16 | 16 | 16 |
| 6 | rand_corpus | 16 | 16 | 16 | 16 | 32 | 16 | 16 | 32 |
| 7 | rand_uniform | 16 | 32 | 16 | 16 | 32 | 16 | 16 | 16 |
| 8 | rand_init | 16 | 16 | 32 | 16 | 16 | 16 | 32 | 16 |
| 9 | no_pos | 16 | 32 | 16 | 16 | 32 | 16 | 16 | 16 |
- Perform inference similar to RoBERTa as well:
```python
from fairseq.models.roberta import RobertaModel
roberta = RobertaModel.from_pretrained(
'checkpoints/',
checkpoint_file='checkpoint_best.pt',
data_name_or_path='PAWS-bin'
)
label_fn = lambda label: roberta.task.label_dictionary.string(
[label + roberta.task.label_dictionary.nspecial]
)
ncorrect, nsamples = 0, 0
roberta.cuda()
roberta.eval()
with open('paws_data/dev.tsv') as fin:
fin.readline()
for index, line in enumerate(fin):
tokens = line.strip().split('\t')
sent1, sent2, target = tokens[0], tokens[1], tokens[2]
tokens = roberta.encode(sent1, sent2)
prediction = roberta.predict('sentence_classification_head', tokens).argmax().item()
prediction_label = label_fn(prediction)
ncorrect += int(prediction_label == target)
nsamples += 1
print('| Accuracy: ', float(ncorrect)/float(nsamples))
```
|