MikkoLipsanen
commited on
Commit
•
2ca67a2
1
Parent(s):
9bf3620
Update README.md
Browse files
README.md
CHANGED
@@ -60,9 +60,9 @@ print(predictions)
|
|
60 |
|
61 |
Some of the entities (for instance WORK_OF_ART, LAW, MONEY) that have been annotated in the [Turku OntoNotes Entities Corpus](https://github.com/TurkuNLP/turku-one)
|
62 |
dataset were filtered out from the dataset used for training the model. On the other hand, entities that were missing from the [NewsEye dataset](https://zenodo.org/record/4573313)
|
63 |
-
were added during the annotation process. The different data sources used in model training are listed below:
|
64 |
|
65 |
-
Dataset|Period covered by the texts|Text type|Percentage of the data
|
66 |
-|-|-|-
|
67 |
[Turku OntoNotes Entities Corpus](https://github.com/TurkuNLP/turku-one)|2000s|Online texts|23%
|
68 |
[NewsEye dataset](https://zenodo.org/record/4573313)|1850-1950|OCR'd digitized newspaper articles|3%
|
@@ -77,9 +77,9 @@ entity classes contained in training, validation and test datasets are listed be
|
|
77 |
### Number of entity types in the data
|
78 |
Dataset|PERSON|ORG|LOC|GPE|PRODUCT|EVENT|DATE|JON|FIBC|NORP
|
79 |
-|-|-|-|-|-|-|-|-|-|-
|
80 |
-
Train|
|
81 |
-
Val|
|
82 |
-
Test|
|
83 |
|
84 |
## Training procedure
|
85 |
|
|
|
60 |
|
61 |
Some of the entities (for instance WORK_OF_ART, LAW, MONEY) that have been annotated in the [Turku OntoNotes Entities Corpus](https://github.com/TurkuNLP/turku-one)
|
62 |
dataset were filtered out from the dataset used for training the model. On the other hand, entities that were missing from the [NewsEye dataset](https://zenodo.org/record/4573313)
|
63 |
+
were added during the annotation process. The different data sources used in model training, validation and testing are listed below:
|
64 |
|
65 |
+
Dataset|Period covered by the texts|Text type|Percentage of the total data
|
66 |
-|-|-|-
|
67 |
[Turku OntoNotes Entities Corpus](https://github.com/TurkuNLP/turku-one)|2000s|Online texts|23%
|
68 |
[NewsEye dataset](https://zenodo.org/record/4573313)|1850-1950|OCR'd digitized newspaper articles|3%
|
|
|
77 |
### Number of entity types in the data
|
78 |
Dataset|PERSON|ORG|LOC|GPE|PRODUCT|EVENT|DATE|JON|FIBC|NORP
|
79 |
-|-|-|-|-|-|-|-|-|-|-
|
80 |
+
Train|20211|45722|1321|19387|9571|1616|23642|2460|2384|2529
|
81 |
+
Val|2525|5517|130|2512|1217|240|3047|306|247|283
|
82 |
+
Test|2414|5577|179|2445|1097|183|2838|272|374|356
|
83 |
|
84 |
## Training procedure
|
85 |
|