sultan commited on
Commit
b9a145a
1 Parent(s): bfe4c75

Initial commit

Browse files
Files changed (4) hide show
  1. README.md +128 -0
  2. config.json +28 -0
  3. pytorch_model.bin +3 -0
  4. vocab.txt +0 -0
README.md ADDED
@@ -0,0 +1,128 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # BioM-Transformers: Building Large Biomedical Language Models with BERT, ALBERT and ELECTRA
2
+
3
+ # Abstract
4
+
5
+
6
+ The impact of design choices on the performance
7
+ of biomedical language models recently
8
+ has been a subject for investigation. In
9
+ this paper, we empirically study biomedical
10
+ domain adaptation with large transformer models
11
+ using different design choices. We evaluate
12
+ the performance of our pretrained models
13
+ against other existing biomedical language
14
+ models in the literature. Our results show that
15
+ we achieve state-of-the-art results on several
16
+ biomedical domain tasks despite using similar
17
+ or less computational cost compared to other
18
+ models in the literature. Our findings highlight
19
+ the significant effect of design choices on
20
+ improving the performance of biomedical language
21
+ models.
22
+
23
+ # Model Description
24
+
25
+ This model is fine-tuned on the SQuAD2.0 dataset. Fine-tuning the biomedical language model on the SQuAD dataset helps improve the score on the BioASQ challenge. If you plan to work with BioASQ or biomedical QA tasks, it's better to use this model over BioM-ELECTRA-Base.
26
+
27
+ Huggingface library doesn't implement Layer-Wise decay feature, which affects the performance on SQuAD task. The reported result of BioM-ELECTRA-Base-SQuAD in our paper is 84.4 (F1) since we use ELECTRA open-source code with TF checkpoint, which uses Layer-Wise decay.
28
+
29
+
30
+ Evaluation results on SQuAD2.0 Dev Dataset
31
+ ```
32
+ eval_HasAns_exact = 79.2679
33
+ eval_HasAns_f1 = 86.5416
34
+ eval_HasAns_total = 5928
35
+ eval_NoAns_exact = 75.8789
36
+ eval_NoAns_f1 = 75.8789
37
+ eval_NoAns_total = 5945
38
+ eval_best_exact = 77.571
39
+ eval_best_exact_thresh = 0.0
40
+ eval_best_f1 = 81.2026
41
+ eval_best_f1_thresh = 0.0
42
+ eval_exact = 77.571
43
+ eval_f1 = 81.2026
44
+ eval_samples = 11979
45
+ eval_total = 11873
46
+
47
+ ```
48
+
49
+
50
+ - First make sure to install all libraries on Google Colab and make sure GPU is enabled
51
+
52
+ ```python
53
+ !git clone https://github.com/huggingface/transformers
54
+
55
+ !pip3 install -e transformers
56
+
57
+ !pip3 install sentencepiece
58
+
59
+ !pip3 install -r /content/transformers/examples/pytorch/question-answering/requirements.txt
60
+
61
+ ```
62
+
63
+ - Training script
64
+
65
+ ```python
66
+ python3 transformers/examples/pytorch/question-answering/run_qa.py --model_name_or_path sultan/BioM-ELECTRA-Base-Discriminator \
67
+ --dataset_name squad_v2 \
68
+ --do_train \
69
+ --do_eval \
70
+ --dataloader_num_workers 20 \
71
+ --preprocessing_num_workers 20 \
72
+ --version_2_with_negative \
73
+ --num_train_epochs 3 \
74
+ --learning_rate 4e-5 \
75
+ --max_seq_length 512 \
76
+ --doc_stride 128 \
77
+ --per_device_train_batch_size 8 \
78
+ --gradient_accumulation_steps 3 \
79
+ --per_device_eval_batch_size 128 \
80
+ --fp16 \
81
+ --fp16_opt_level O1 \
82
+ --logging_steps 50 \
83
+ --save_steps 5000 \
84
+ --overwrite_output_dir \
85
+ --output_dir out
86
+ ```
87
+
88
+
89
+ - Reproduce results without training ( only eval):
90
+
91
+ ```python
92
+ python transformers/examples/pytorch/question-answering/run_qa.py --model_name_or_path sultan/BioM-ELECTRA-Base-SQuAD2 \
93
+ --do_eval \
94
+ --version_2_with_negative \
95
+ --per_device_eval_batch_size 8 \
96
+ --dataset_name squad_v2 \
97
+ --overwrite_output_dir \
98
+ --fp16 \
99
+ --output_dir out
100
+ ```
101
+
102
+ - You don't need to download the SQuAD2 dataset. The code will download it from the HuggingFace datasets hub.
103
+
104
+ - Check our GitHub repo at https://github.com/salrowili/BioM-Transformers for TensorFlow and GluonNLP checkpoints.
105
+
106
+
107
+ # Acknowledgment
108
+
109
+ We would like to acknowledge the support we have from Tensorflow Research Cloud (TFRC) team to grant us access to TPUv3 units.
110
+
111
+ # Citation
112
+
113
+
114
+ ```bibtex
115
+ @inproceedings{alrowili-shanker-2021-biom,
116
+ title = "{B}io{M}-Transformers: Building Large Biomedical Language Models with {BERT}, {ALBERT} and {ELECTRA}",
117
+ author = "Alrowili, Sultan and
118
+ Shanker, Vijay",
119
+ booktitle = "Proceedings of the 20th Workshop on Biomedical Language Processing",
120
+ month = jun,
121
+ year = "2021",
122
+ address = "Online",
123
+ publisher = "Association for Computational Linguistics",
124
+ url = "https://www.aclweb.org/anthology/2021.bionlp-1.24",
125
+ pages = "221--227",
126
+ abstract = "The impact of design choices on the performance of biomedical language models recently has been a subject for investigation. In this paper, we empirically study biomedical domain adaptation with large transformer models using different design choices. We evaluate the performance of our pretrained models against other existing biomedical language models in the literature. Our results show that we achieve state-of-the-art results on several biomedical domain tasks despite using similar or less computational cost compared to other models in the literature. Our findings highlight the significant effect of design choices on improving the performance of biomedical language models.",
127
+ }
128
+ ```
config.json ADDED
@@ -0,0 +1,28 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "sultan/BioM-ELECTRA-Base-Discriminator",
3
+ "architectures": [
4
+ "ElectraForQuestionAnswering"
5
+ ],
6
+ "attention_probs_dropout_prob": 0.1,
7
+ "embedding_size": 768,
8
+ "hidden_act": "gelu",
9
+ "hidden_dropout_prob": 0.1,
10
+ "hidden_size": 768,
11
+ "initializer_range": 0.02,
12
+ "intermediate_size": 3072,
13
+ "layer_norm_eps": 1e-12,
14
+ "max_position_embeddings": 512,
15
+ "model_type": "electra",
16
+ "num_attention_heads": 12,
17
+ "num_hidden_layers": 12,
18
+ "pad_token_id": 0,
19
+ "position_embedding_type": "absolute",
20
+ "summary_activation": "gelu",
21
+ "summary_last_dropout": 0.1,
22
+ "summary_type": "first",
23
+ "summary_use_proj": true,
24
+ "torch_dtype": "float32",
25
+ "transformers_version": "4.9.0.dev0",
26
+ "type_vocab_size": 2,
27
+ "vocab_size": 28895
28
+ }
pytorch_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5ee2b15237a556feb1fb59d9739363a37a5aa0eda6d10cae7d0fc3ddf199dab6
3
+ size 430659249
vocab.txt ADDED
The diff for this file is too large to render. See raw diff