File size: 3,674 Bytes
cca586b 905c8b8 cca586b 905c8b8 a5a77b8 31a41c7 c9fd291 89f2b1f 3bf47ab 8d591c0 642d8da 2be1a38 0b06305 cca586b 45b950b fbaf36f cca586b 4d3a18f de8c8a3 4d3a18f 233c4a9 025cfff de8c8a3 cca586b 4d3a18f cca586b 4d3a18f ae627a8 cca586b a67a708 de8c8a3 cca586b 4d3a18f de8c8a3 4d3a18f de8c8a3 cca586b ebe1b84 cca586b 4d3a18f |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 |
---
license: mit
tags:
- generated_from_trainer
datasets:
- squad_v2
base_model: microsoft/deberta-v3-large
model-index:
- name: deberta-v3-large-finetuned-squadv2
results:
- task:
type: question-answering
name: Extractive Question Answering
dataset:
name: SQuAD2.0
type: squad_v2
split: validation[:11873]
metrics:
- type: exact
value: 88.69704371262529
name: eval_exact
- type: f1
value: 91.51550564529175
name: eval_f1
- type: HasAns_exact
value: 83.70445344129554
name: HasAns_exact
- type: HasAns_f1
value: 89.34945994037624
name: HasAns_f1
- type: HasAns_total
value: 5928
name: HasAns_total
- type: NoAns_exact
value: 93.6753574432296
name: NoAns_exact
- type: NoAns_f1
value: 93.6753574432296
name: NoAns_f1
- type: NoAns_total
value: 5945
name: NoAns_total
---
# deberta-v3-large-finetuned-squadv2
This model is a version of [microsoft/deberta-v3-large](https://huggingface.co/microsoft/deberta-v3-large) fine-tuned on the SQuAD version 2.0 dataset.
Fine-tuning & evaluation on a NVIDIA Titan RTX - 24GB GPU took 15 hours.
## Results from 2023 ICLR paper, "DeBERTaV3: Improving DeBERTa using ELECTRA-Style Pre-Training with Gradient-Disentangled Embedding Sharing", by Pengcheng He, et. al.
- 'EM' : 89.0
- 'F1' : 91.5
## Results calculated with:
```python
metrics = evaluate.load("squad_v2")
squad_v2_metrics = metrics.compute(predictions = formatted_predictions, references = references)
```
## for this fine-tuning:
- 'exact' : 88.70,
- 'f1' : 91.52,
- 'total' : 11873,
- 'HasAns_exact' : 83.70,
- 'HasAns_f1' : 89.35,
- 'HasAns_total' : 5928,
- 'NoAns_exact' : 93.68,
- 'NoAns_f1' : 93.68,
- 'NoAns_total' : 5945,
- 'best_exact' : 88.70,
- 'best_exact_thresh' : 0.0,
- 'best_f1' : 91.52,
- 'best_f1_thresh' : 0.0}
## Model description
For the authors' models, code & detailed information see: https://github.com/microsoft/DeBERTa
## Intended uses
Extractive question answering on a given context
### Fine-tuning hyperparameters
The following hyperparameters, as suggested by the 2023 ICLR paper noted above, were used during fine-tuning:
- learning_rate : 1e-05
- train_batch_size : 8
- eval_batch_size : 8
- seed : 42
- gradient_accumulation_steps : 8
- total_train_batch_size : 64
- optimizer : Adam with betas = (0.9, 0.999) and epsilon = 1e-06
- lr_scheduler_type : linear
- lr_scheduler_warmup_steps : 1000
- training_steps : 5200
### Framework versions
- Transformers : 4.35.0.dev0
- Pytorch : 2.1.0+cu121
- Datasets : 2.14.5
- Tokenizers : 0.14.0
### System
- CPU : Intel(R) Core(TM) i9-9900K - 32GB RAM
- Python version : 3.11.5 [GCC 11.2.0] (64-bit runtime)
- Python platform : Linux-5.15.0-86-generic-x86_64-with-glibc2.35
- GPU : NVIDIA TITAN RTX - 24GB Memory
- CUDA runtime version : 12.1.105
- Nvidia driver version : 535.113.01
### Fine-tuning (Training) results before/after the best model (Step 3620)
| Training Loss | Epoch | Step | Validation Loss |
|:-------------:|:-----:|:----:|:---------------:|
| 0.5323 | 1.72 | 3500 | 0.5860 |
| 0.5129 | 1.73 | 3520 | 0.5656 |
| 0.5441 | 1.74 | 3540 | 0.5642 |
| 0.5624 | 1.75 | 3560 | 0.5873 |
| 0.4645 | 1.76 | 3580 | 0.5891 |
| 0.5577 | 1.77 | 3600 | 0.5816 |
| 0.5199 | 1.78 | 3620 | 0.5579 |
| 0.5061 | 1.79 | 3640 | 0.5837 |
| 0.484 | 1.79 | 3660 | 0.5721 |
| 0.5095 | 1.8 | 3680 | 0.5821 |
| 0.5342 | 1.81 | 3700 | 0.5602 | |