metadata
license: mit
base_model: microsoft/deberta-v3-large
tags:
- generated_from_trainer
datasets:
- squad_v2
model-index:
- name: deberta-v3-large-finetuned-squadv2
results: []
deberta-v3-large-finetuned-squadv2
This model is a version of microsoft/deberta-v3-large fine-tuned on the SQuAD version 2.0 dataset.
Results from 2023 ICLR paper, "DeBERTaV3: Improving DeBERTa using ELECTRA-Style Pre-Training with Gradient-Disentangled Embedding Sharing", by Pengcheng He, et. al.
- 'EM': 89.0
- 'F1': 91.5
Results from this fine-tuning:
- 'exact': 88.70,
- 'f1': 91.52,
- 'total': 11873,
- 'HasAns_exact': 83.70,
- 'HasAns_f1': 89.35,
- 'HasAns_total': 5928,
- 'NoAns_exact': 93.68,
- 'NoAns_f1': 93.68,
- 'NoAns_total': 5945,
- 'best_exact': 88.70,
- 'best_exact_thresh': 0.0,
- 'best_f1': 91.52,
- 'best_f1_thresh': 0.0}
Model description
For the authors' models, code & detailed information see: https://github.com/microsoft/DeBERTa
Intended uses
Extractive question answering from a given context
Training hyperparameters
The following hyperparameters, as suggested by the 2023 ICLR paper noted above, were used during training:
- learning_rate: 1e-05
- train_batch_size: 8
- eval_batch_size: 8
- seed: 42
- gradient_accumulation_steps: 8
- total_train_batch_size: 64
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-06
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 1000
- training_steps: 5200
Framework versions
- Transformers 4.35.0.dev0
- Pytorch 2.1.0+cu121
- Datasets 2.14.5
- Tokenizers 0.14.0
System
- CPU: Intel(R) Core(TM) i9-9900K - 32GB RAM
- Python version: 3.11.5 [GCC 11.2.0] (64-bit runtime)
- Python platform: Linux-5.15.0-86-generic-x86_64-with-glibc2.35
- GPU: NVIDIA TITAN RTX - 24GB Memory
- CUDA runtime version: 12.1.105
- Nvidia driver version: 535.113.01
Training results before/after the best model (Step 3620)
Training Loss | Epoch | Step | Validation Loss |
---|---|---|---|
0.5323 | 1.72 | 3500 | 0.5860 |
0.5129 | 1.73 | 3520 | 0.5656 |
0.5441 | 1.74 | 3540 | 0.5642 |
0.5624 | 1.75 | 3560 | 0.5873 |
0.4645 | 1.76 | 3580 | 0.5891 |
0.5577 | 1.77 | 3600 | 0.5816 |
0.5199 | 1.78 | 3620 | 0.5579 |
0.5061 | 1.79 | 3640 | 0.5837 |
0.484 | 1.79 | 3660 | 0.5721 |
0.5095 | 1.8 | 3680 | 0.5821 |
0.5342 | 1.81 | 3700 | 0.5602 |