ahotrod's picture
Update README.md
45b950b
|
raw
history blame
2.62 kB
metadata
license: mit
base_model: microsoft/deberta-v3-large
tags:
  - generated_from_trainer
datasets:
  - squad_v2
model-index:
  - name: deberta-v3-large-finetuned-squadv2
    results: []

deberta-v3-large-finetuned-squadv2

This model is a version of microsoft/deberta-v3-large fine-tuned on the SQuAD version 2.0 dataset.

Results from 2023 ICLR paper, "DeBERTaV3: Improving DeBERTa using ELECTRA-Style Pre-Training with Gradient-Disentangled Embedding Sharing", by Pengcheng He, et. al.

  • 'EM': 89.0
  • 'F1': 91.5

Results from this fine-tuning:

  • 'exact': 88.70,
  • 'f1': 91.52,
  • 'total': 11873,
  • 'HasAns_exact': 83.70,
  • 'HasAns_f1': 89.35,
  • 'HasAns_total': 5928,
  • 'NoAns_exact': 93.68,
  • 'NoAns_f1': 93.68,
  • 'NoAns_total': 5945,
  • 'best_exact': 88.70,
  • 'best_exact_thresh': 0.0,
  • 'best_f1': 91.52,
  • 'best_f1_thresh': 0.0}

Model description

For the authors' models, code & detailed information see: https://github.com/microsoft/DeBERTa

Intended uses

Extractive question answering from a given context

Training hyperparameters

The following hyperparameters, as suggested by the 2023 ICLR paper noted above, were used during training:

  • learning_rate: 1e-05
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 64
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-06
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 1000
  • training_steps: 5200

Framework versions

  • Transformers 4.35.0.dev0
  • Pytorch 2.1.0+cu121
  • Datasets 2.14.5
  • Tokenizers 0.14.0

System

  • CPU: Intel(R) Core(TM) i9-9900K - 32GB RAM
  • Python version: 3.11.5 [GCC 11.2.0] (64-bit runtime)
  • Python platform: Linux-5.15.0-86-generic-x86_64-with-glibc2.35
  • GPU: NVIDIA TITAN RTX - 24GB Memory
  • CUDA runtime version: 12.1.105
  • Nvidia driver version: 535.113.01

Training results before/after the best model (Step 3620)

Training Loss Epoch Step Validation Loss
0.5323 1.72 3500 0.5860
0.5129 1.73 3520 0.5656
0.5441 1.74 3540 0.5642
0.5624 1.75 3560 0.5873
0.4645 1.76 3580 0.5891
0.5577 1.77 3600 0.5816
0.5199 1.78 3620 0.5579
0.5061 1.79 3640 0.5837
0.484 1.79 3660 0.5721
0.5095 1.8 3680 0.5821
0.5342 1.81 3700 0.5602