BorisAlbar
commited on
Commit
•
7c9e0ff
1
Parent(s):
b9be56c
Upload model to v2
Browse files- README.md +37 -11
- config.json +1 -1
- pytorch_model.bin +2 -2
- tokenizer.json +2 -16
README.md
CHANGED
@@ -39,25 +39,51 @@ This represents a total of over **138 061 questions/answers pairs used to finet
|
|
39 |
| [PIAFv1.2](https://www.data.gouv.fr/en/datasets/piaf-le-dataset-francophone-de-questions-reponses/)| SQuAD v1 | 9 225 Q & A | X | X |
|
40 |
| [FQuADv1.0](https://fquad.illuin.tech/)| SQuAD v1 | 20 731 Q & A | 3 188 Q & A (not used in training because it serves as a test dataset) | 2 189 Q & A (not used in our work because not freely available)|
|
41 |
| [lincoln/newsquadfr](https://huggingface.co/datasets/lincoln/newsquadfr) | SQuAD v1 | 1 650 Q & A | 455 Q & A (not used in our work) | 415 Q & A (not used in our work) |
|
42 |
-
| [pragnakalp/squad_v2_french_translated](https://huggingface.co/datasets/pragnakalp/squad_v2_french_translated)| SQuAD v2 | 79 069 Q & A | X | X |
|
43 |
-
| [Mfa]()♪ | SQuAD v2 | 27 386 Q & A | X | X |
|
44 |
-
|
45 |
-
♪ this fifth data set will be added soon.
|
46 |
|
47 |
## Evaluation results
|
48 |
-
### FQuAD v1.0 Evaluation
|
49 |
-
```shell
|
50 |
-
{"f1": 80.75789384679857, "exact_match": 57.214554579673774}
|
51 |
-
```
|
52 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
53 |
|
54 |
-
|
55 |
|
56 |
| Model | Exact_match | F1-score |
|
57 |
| ----------- | ----------- | ----------- |
|
58 |
-
| [etalab-ia/camembert-base-squadFR-fquad-piaf](https://huggingface.co/etalab-ia/camembert-base-squadFR-fquad-piaf) |
|
59 |
-
| QAmembert |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
60 |
|
|
|
61 |
|
62 |
## Usage
|
63 |
### Example with answer in the context
|
|
|
39 |
| [PIAFv1.2](https://www.data.gouv.fr/en/datasets/piaf-le-dataset-francophone-de-questions-reponses/)| SQuAD v1 | 9 225 Q & A | X | X |
|
40 |
| [FQuADv1.0](https://fquad.illuin.tech/)| SQuAD v1 | 20 731 Q & A | 3 188 Q & A (not used in training because it serves as a test dataset) | 2 189 Q & A (not used in our work because not freely available)|
|
41 |
| [lincoln/newsquadfr](https://huggingface.co/datasets/lincoln/newsquadfr) | SQuAD v1 | 1 650 Q & A | 455 Q & A (not used in our work) | 415 Q & A (not used in our work) |
|
42 |
+
| [pragnakalp/squad_v2_french_translated](https://huggingface.co/datasets/pragnakalp/squad_v2_french_translated)| SQuAD v2 | 79 069 Q & A | X | X |
|
|
|
|
|
|
|
43 |
|
44 |
## Evaluation results
|
|
|
|
|
|
|
|
|
45 |
|
46 |
+
The evaluation was carried out using the [**evaluate**](https://pypi.org/project/evaluate/) python package.
|
47 |
+
|
48 |
+
### FQuaD 1.0 (validation)
|
49 |
+
|
50 |
+
The metric used is Squad v1.
|
51 |
+
|
52 |
+
| Model | Exact_match | F1-score |
|
53 |
+
| ----------- | ----------- | ----------- |
|
54 |
+
| [etalab-ia/camembert-base-squadFR-fquad-piaf](https://huggingface.co/etalab-ia/camembert-base-squadFR-fquad-piaf) | 53.60 | 78.09 |
|
55 |
+
| QAmembert (previous version) | 54.26 | 77.87 |
|
56 |
+
| QAmembert (this version) | 53.98 | 78.00 |
|
57 |
+
| QAmembert-large ♪ | **55.95** | **81.05** |
|
58 |
+
| [fT0](https://huggingface.co/CATIE-AQ/frenchT0) | 41.15 | 65.79 |
|
59 |
+
|
60 |
+
♪ this model is available on demand only.
|
61 |
+
|
62 |
+
### qwant/squad_fr (validation)
|
63 |
|
64 |
+
The metric used is Squad v1.
|
65 |
|
66 |
| Model | Exact_match | F1-score |
|
67 |
| ----------- | ----------- | ----------- |
|
68 |
+
| [etalab-ia/camembert-base-squadFR-fquad-piaf](https://huggingface.co/etalab-ia/camembert-base-squadFR-fquad-piaf) | 60.17 | 78.27 |
|
69 |
+
| QAmembert (previous version) | 60.40 | 77.27 |
|
70 |
+
| QAmembert (this version) | 60.95 | 77.30 |
|
71 |
+
| QAmembert-large ♪ | **65.58** | **81.74** |
|
72 |
+
|
73 |
+
♪ this model is available on demand only.
|
74 |
+
|
75 |
+
### frenchQA
|
76 |
+
|
77 |
+
This dataset includes question with no answers in the context. The metric used is Squad v2.
|
78 |
+
|
79 |
+
| Model | Exact_match | F1-score | Answer_f1 | NoAnswer_f1 |
|
80 |
+
| ----------- | ----------- | ----------- | ----------- | ----------- |
|
81 |
+
| [etalab-ia/camembert-base-squadFR-fquad-piaf](https://huggingface.co/etalab-ia/camembert-base-squadFR-fquad-piaf) | n/a | n/a | n/a | n/a |
|
82 |
+
| QAmembert (previous version) | 60.28 | 71.29 | 75.92 | 66.65
|
83 |
+
| QAmembert (this version) | **77.14** | 86.88 | 75.66 | 98.11
|
84 |
+
| QAmembert-large ♪ | **77.14** | **88.74** | **78.83** | **98.65**
|
85 |
|
86 |
+
♪ this model is available on demand only.
|
87 |
|
88 |
## Usage
|
89 |
### Example with answer in the context
|
config.json
CHANGED
@@ -21,7 +21,7 @@
|
|
21 |
"pad_token_id": 1,
|
22 |
"position_embedding_type": "absolute",
|
23 |
"torch_dtype": "float32",
|
24 |
-
"transformers_version": "4.
|
25 |
"type_vocab_size": 1,
|
26 |
"use_cache": true,
|
27 |
"vocab_size": 32005
|
|
|
21 |
"pad_token_id": 1,
|
22 |
"position_embedding_type": "absolute",
|
23 |
"torch_dtype": "float32",
|
24 |
+
"transformers_version": "4.26.1",
|
25 |
"type_vocab_size": 1,
|
26 |
"use_cache": true,
|
27 |
"vocab_size": 32005
|
pytorch_model.bin
CHANGED
@@ -1,3 +1,3 @@
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:
|
3 |
-
size
|
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:36796fd3145baf67e83b7878ce5793998e26115a4dac47d9a5a8fee831a214d7
|
3 |
+
size 440204333
|
tokenizer.json
CHANGED
@@ -1,21 +1,7 @@
|
|
1 |
{
|
2 |
"version": "1.0",
|
3 |
-
"truncation":
|
4 |
-
|
5 |
-
"max_length": 512,
|
6 |
-
"strategy": "OnlySecond",
|
7 |
-
"stride": 128
|
8 |
-
},
|
9 |
-
"padding": {
|
10 |
-
"strategy": {
|
11 |
-
"Fixed": 512
|
12 |
-
},
|
13 |
-
"direction": "Right",
|
14 |
-
"pad_to_multiple_of": null,
|
15 |
-
"pad_id": 1,
|
16 |
-
"pad_type_id": 0,
|
17 |
-
"pad_token": "<pad>"
|
18 |
-
},
|
19 |
"added_tokens": [
|
20 |
{
|
21 |
"id": 0,
|
|
|
1 |
{
|
2 |
"version": "1.0",
|
3 |
+
"truncation": null,
|
4 |
+
"padding": null,
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
5 |
"added_tokens": [
|
6 |
{
|
7 |
"id": 0,
|