Update README.md
Browse files
README.md
ADDED
@@ -0,0 +1,97 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
|
2 |
+
# Turkish QNLI Model
|
3 |
+
|
4 |
+
I fine-tuned Turkish-Bert-Model for Question-Answering problem with Turkish version of SQuAD; TQuAD
|
5 |
+
https://huggingface.co/dbmdz/bert-base-turkish-uncased
|
6 |
+
|
7 |
+
# Data: TQuAD
|
8 |
+
I used following TQuAD data set
|
9 |
+
|
10 |
+
https://github.com/TQuad/turkish-nlp-qa-dataset
|
11 |
+
|
12 |
+
I convert the dataset into transformers glue data format of QNLI by the following script
|
13 |
+
SQuAD -> QNLI
|
14 |
+
|
15 |
+
```
|
16 |
+
import argparse
|
17 |
+
import collections
|
18 |
+
import json
|
19 |
+
import numpy as np
|
20 |
+
import os
|
21 |
+
import re
|
22 |
+
import string
|
23 |
+
import sys
|
24 |
+
|
25 |
+
ff="dev-v0.1.json"
|
26 |
+
ff="train-v0.1.json"
|
27 |
+
dataset=json.load(open(ff))
|
28 |
+
|
29 |
+
i=0
|
30 |
+
for article in dataset['data']:
|
31 |
+
title= article['title']
|
32 |
+
for p in article['paragraphs']:
|
33 |
+
context= p['context']
|
34 |
+
for qa in p['qas']:
|
35 |
+
answer= qa['answers'][0]['text']
|
36 |
+
all_other_answers= list(set([e['answers'][0]['text'] for e in p['qas']]))
|
37 |
+
all_other_answers.remove(answer)
|
38 |
+
i=i+1
|
39 |
+
print(i,qa['question'].replace(";",":") , answer.replace(";",":"),"entailment", sep="\t")
|
40 |
+
for other in all_other_answers:
|
41 |
+
i=i+1
|
42 |
+
print(i,qa['question'].replace(";",":") , other.replace(";",":"),"not_entailment" ,sep="\t")
|
43 |
+
|
44 |
+
```
|
45 |
+
|
46 |
+
|
47 |
+
Under QNLI folder there are dev and test test
|
48 |
+
Training data looks like
|
49 |
+
> 613 II.Friedrich’in bilginler arasındaki en önemli şahsiyet olarak belirttiği kişi kimdir? filozof, kimyacı, astrolog ve çevirmen not_entailment
|
50 |
+
> 614 II.Friedrich’in bilginler arasındaki en önemli şahsiyet olarak belirttiği kişi kimdir? kişisel eğilimi ve özel temaslar nedeniyle not_entailment
|
51 |
+
> 615 Michael Scotus’un mesleği nedir? filozof, kimyacı, astrolog ve çevirmen entailment
|
52 |
+
> 616 Michael Scotus’un mesleği nedir? Palermo’ya not_entailment
|
53 |
+
|
54 |
+
|
55 |
+
|
56 |
+
|
57 |
+
|
58 |
+
# Training
|
59 |
+
|
60 |
+
Training the model with following environment
|
61 |
+
```
|
62 |
+
export GLUE_DIR=./glue/glue_dataTR/QNLI
|
63 |
+
export TASK_NAME=QNLI
|
64 |
+
```
|
65 |
+
|
66 |
+
```
|
67 |
+
python3 run_glue.py \
|
68 |
+
--model_type bert \
|
69 |
+
--model_name_or_path dbmdz/bert-base-turkish-uncased\
|
70 |
+
--task_name $TASK_NAME \
|
71 |
+
--do_train \
|
72 |
+
--do_eval \
|
73 |
+
--data_dir $GLUE_DIR \
|
74 |
+
--max_seq_length 128 \
|
75 |
+
--per_gpu_train_batch_size 32 \
|
76 |
+
--learning_rate 2e-5 \
|
77 |
+
--num_train_epochs 3.0 \
|
78 |
+
--output_dir /tmp/$TASK_NAME/
|
79 |
+
|
80 |
+
```
|
81 |
+
|
82 |
+
|
83 |
+
# Evaluation Results
|
84 |
+
|
85 |
+
==
|
86 |
+
| acc | 0.9124060613527165
|
87 |
+
| loss| 0.21582801340189717
|
88 |
+
==
|
89 |
+
|
90 |
+
> See all my model
|
91 |
+
> https://huggingface.co/savasy
|
92 |
+
|
93 |
+
|
94 |
+
|
95 |
+
|
96 |
+
|
97 |
+
|