ahotrod
/

deberta-v3-large-finetuned-squadv2

@@ -28,23 +28,23 @@ This model is a version of [microsoft/deberta-v3-large](https://huggingface.co/m
 Fine-tuning & detailed evaluation on a NVIDIA Titan RTX - 24GB GPU took 15 hours.
 ##  Results from 2023 ICLR paper, "DeBERTaV3: Improving DeBERTa using ELECTRA-Style Pre-Training with Gradient-Disentangled Embedding Sharing", by Pengcheng He, et. al.
-- 'EM': 89.0
-- 'F1': 91.5
 ## Results from this fine-tuning:
-- 'exact': 88.70,
-- 'f1': 91.52,
-- 'total': 11873,
-- 'HasAns_exact': 83.70,
-- 'HasAns_f1': 89.35,
-- 'HasAns_total': 5928,
-- 'NoAns_exact': 93.68,
-- 'NoAns_f1': 93.68,
-- 'NoAns_total': 5945,
-- 'best_exact': 88.70,
-- 'best_exact_thresh': 0.0,
-- 'best_f1': 91.52,
-- 'best_f1_thresh': 0.0}
 ## Model description
 For the authors' models, code & detailed information see:  https://github.com/microsoft/DeBERTa
@@ -54,30 +54,30 @@ Extractive question answering on a given context
 ### Fine-tuning hyperparameters
 The following hyperparameters, as suggested by the 2023 ICLR paper noted above, were used during fine-tuning:
-- learning_rate: 1e-05
-- train_batch_size: 8
-- eval_batch_size: 8
-- seed: 42
-- gradient_accumulation_steps: 8
-- total_train_batch_size: 64
-- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-06
-- lr_scheduler_type: linear
-- lr_scheduler_warmup_steps: 1000
-- training_steps: 5200
 ### Framework versions
-- Transformers 4.35.0.dev0
-- Pytorch 2.1.0+cu121
-- Datasets 2.14.5
-- Tokenizers 0.14.0
 ### System
-- CPU: Intel(R) Core(TM) i9-9900K - 32GB RAM
-- Python version: 3.11.5 [GCC 11.2.0] (64-bit runtime)
-- Python platform: Linux-5.15.0-86-generic-x86_64-with-glibc2.35
-- GPU: NVIDIA TITAN RTX - 24GB Memory
-- CUDA runtime version: 12.1.105
-- Nvidia driver version: 535.113.01
 ### Fine-tuning (Training) results before/after the best model (Step 3620)
 | Training Loss | Epoch | Step | Validation Loss |

 Fine-tuning & detailed evaluation on a NVIDIA Titan RTX - 24GB GPU took 15 hours.
 ##  Results from 2023 ICLR paper, "DeBERTaV3: Improving DeBERTa using ELECTRA-Style Pre-Training with Gradient-Disentangled Embedding Sharing", by Pengcheng He, et. al.
+- 'EM' : 89.0
+- 'F1' : 91.5
 ## Results from this fine-tuning:
+- 'exact' : 88.70,
+- 'f1' : 91.52,
+- 'total' : 11873,
+- 'HasAns_exact' : 83.70,
+- 'HasAns_f1' : 89.35,
+- 'HasAns_total' : 5928,
+- 'NoAns_exact' : 93.68,
+- 'NoAns_f1' : 93.68,
+- 'NoAns_total' : 5945,
+- 'best_exact' : 88.70,
+- 'best_exact_thresh' : 0.0,
+- 'best_f1' : 91.52,
+- 'best_f1_thresh' : 0.0}
 ## Model description
 For the authors' models, code & detailed information see:  https://github.com/microsoft/DeBERTa
 ### Fine-tuning hyperparameters
 The following hyperparameters, as suggested by the 2023 ICLR paper noted above, were used during fine-tuning:
+- learning_rate : 1e-05
+- train_batch_size : 8
+- eval_batch_size : 8
+- seed : 42
+- gradient_accumulation_steps : 8
+- total_train_batch_size : 64
+- optimizer : Adam with betas = (0.9, 0.999) and epsilon = 1e-06
+- lr_scheduler_type : linear
+- lr_scheduler_warmup_steps : 1000
+- training_steps : 5200
 ### Framework versions
+- Transformers : 4.35.0.dev0
+- Pytorch : 2.1.0+cu121
+- Datasets : 2.14.5
+- Tokenizers : 0.14.0
 ### System
+- CPU : Intel(R) Core(TM) i9-9900K - 32GB RAM
+- Python version : 3.11.5 [GCC 11.2.0] (64-bit runtime)
+- Python platform : Linux-5.15.0-86-generic-x86_64-with-glibc2.35
+- GPU : NVIDIA TITAN RTX - 24GB Memory
+- CUDA runtime version : 12.1.105
+- Nvidia driver version : 535.113.01
 ### Fine-tuning (Training) results before/after the best model (Step 3620)
 | Training Loss | Epoch | Step | Validation Loss |