ahotrod commited on
Commit
de8c8a3
1 Parent(s): c9fd291

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +35 -35
README.md CHANGED
@@ -28,23 +28,23 @@ This model is a version of [microsoft/deberta-v3-large](https://huggingface.co/m
28
  Fine-tuning & detailed evaluation on a NVIDIA Titan RTX - 24GB GPU took 15 hours.
29
 
30
  ## Results from 2023 ICLR paper, "DeBERTaV3: Improving DeBERTa using ELECTRA-Style Pre-Training with Gradient-Disentangled Embedding Sharing", by Pengcheng He, et. al.
31
- - 'EM': 89.0
32
- - 'F1': 91.5
33
 
34
  ## Results from this fine-tuning:
35
- - 'exact': 88.70,
36
- - 'f1': 91.52,
37
- - 'total': 11873,
38
- - 'HasAns_exact': 83.70,
39
- - 'HasAns_f1': 89.35,
40
- - 'HasAns_total': 5928,
41
- - 'NoAns_exact': 93.68,
42
- - 'NoAns_f1': 93.68,
43
- - 'NoAns_total': 5945,
44
- - 'best_exact': 88.70,
45
- - 'best_exact_thresh': 0.0,
46
- - 'best_f1': 91.52,
47
- - 'best_f1_thresh': 0.0}
48
 
49
  ## Model description
50
  For the authors' models, code & detailed information see: https://github.com/microsoft/DeBERTa
@@ -54,30 +54,30 @@ Extractive question answering on a given context
54
 
55
  ### Fine-tuning hyperparameters
56
  The following hyperparameters, as suggested by the 2023 ICLR paper noted above, were used during fine-tuning:
57
- - learning_rate: 1e-05
58
- - train_batch_size: 8
59
- - eval_batch_size: 8
60
- - seed: 42
61
- - gradient_accumulation_steps: 8
62
- - total_train_batch_size: 64
63
- - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-06
64
- - lr_scheduler_type: linear
65
- - lr_scheduler_warmup_steps: 1000
66
- - training_steps: 5200
67
 
68
  ### Framework versions
69
- - Transformers 4.35.0.dev0
70
- - Pytorch 2.1.0+cu121
71
- - Datasets 2.14.5
72
- - Tokenizers 0.14.0
73
 
74
  ### System
75
- - CPU: Intel(R) Core(TM) i9-9900K - 32GB RAM
76
- - Python version: 3.11.5 [GCC 11.2.0] (64-bit runtime)
77
- - Python platform: Linux-5.15.0-86-generic-x86_64-with-glibc2.35
78
- - GPU: NVIDIA TITAN RTX - 24GB Memory
79
- - CUDA runtime version: 12.1.105
80
- - Nvidia driver version: 535.113.01
81
 
82
  ### Fine-tuning (Training) results before/after the best model (Step 3620)
83
  | Training Loss | Epoch | Step | Validation Loss |
 
28
  Fine-tuning & detailed evaluation on a NVIDIA Titan RTX - 24GB GPU took 15 hours.
29
 
30
  ## Results from 2023 ICLR paper, "DeBERTaV3: Improving DeBERTa using ELECTRA-Style Pre-Training with Gradient-Disentangled Embedding Sharing", by Pengcheng He, et. al.
31
+ - 'EM' : 89.0
32
+ - 'F1' : 91.5
33
 
34
  ## Results from this fine-tuning:
35
+ - 'exact' : 88.70,
36
+ - 'f1' : 91.52,
37
+ - 'total' : 11873,
38
+ - 'HasAns_exact' : 83.70,
39
+ - 'HasAns_f1' : 89.35,
40
+ - 'HasAns_total' : 5928,
41
+ - 'NoAns_exact' : 93.68,
42
+ - 'NoAns_f1' : 93.68,
43
+ - 'NoAns_total' : 5945,
44
+ - 'best_exact' : 88.70,
45
+ - 'best_exact_thresh' : 0.0,
46
+ - 'best_f1' : 91.52,
47
+ - 'best_f1_thresh' : 0.0}
48
 
49
  ## Model description
50
  For the authors' models, code & detailed information see: https://github.com/microsoft/DeBERTa
 
54
 
55
  ### Fine-tuning hyperparameters
56
  The following hyperparameters, as suggested by the 2023 ICLR paper noted above, were used during fine-tuning:
57
+ - learning_rate : 1e-05
58
+ - train_batch_size : 8
59
+ - eval_batch_size : 8
60
+ - seed : 42
61
+ - gradient_accumulation_steps : 8
62
+ - total_train_batch_size : 64
63
+ - optimizer : Adam with betas = (0.9, 0.999) and epsilon = 1e-06
64
+ - lr_scheduler_type : linear
65
+ - lr_scheduler_warmup_steps : 1000
66
+ - training_steps : 5200
67
 
68
  ### Framework versions
69
+ - Transformers : 4.35.0.dev0
70
+ - Pytorch : 2.1.0+cu121
71
+ - Datasets : 2.14.5
72
+ - Tokenizers : 0.14.0
73
 
74
  ### System
75
+ - CPU : Intel(R) Core(TM) i9-9900K - 32GB RAM
76
+ - Python version : 3.11.5 [GCC 11.2.0] (64-bit runtime)
77
+ - Python platform : Linux-5.15.0-86-generic-x86_64-with-glibc2.35
78
+ - GPU : NVIDIA TITAN RTX - 24GB Memory
79
+ - CUDA runtime version : 12.1.105
80
+ - Nvidia driver version : 535.113.01
81
 
82
  ### Fine-tuning (Training) results before/after the best model (Step 3620)
83
  | Training Loss | Epoch | Step | Validation Loss |