MikkoLipsanen commited on
Commit
2ca67a2
1 Parent(s): 9bf3620

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +5 -5
README.md CHANGED
@@ -60,9 +60,9 @@ print(predictions)
60
 
61
  Some of the entities (for instance WORK_OF_ART, LAW, MONEY) that have been annotated in the [Turku OntoNotes Entities Corpus](https://github.com/TurkuNLP/turku-one)
62
  dataset were filtered out from the dataset used for training the model. On the other hand, entities that were missing from the [NewsEye dataset](https://zenodo.org/record/4573313)
63
- were added during the annotation process. The different data sources used in model training are listed below:
64
 
65
- Dataset|Period covered by the texts|Text type|Percentage of the data
66
  -|-|-|-
67
  [Turku OntoNotes Entities Corpus](https://github.com/TurkuNLP/turku-one)|2000s|Online texts|23%
68
  [NewsEye dataset](https://zenodo.org/record/4573313)|1850-1950|OCR'd digitized newspaper articles|3%
@@ -77,9 +77,9 @@ entity classes contained in training, validation and test datasets are listed be
77
  ### Number of entity types in the data
78
  Dataset|PERSON|ORG|LOC|GPE|PRODUCT|EVENT|DATE|JON|FIBC|NORP
79
  -|-|-|-|-|-|-|-|-|-|-
80
- Train|11691|30026|868|12999|7473|1184|14918|1360|1879|2068
81
- Val|2478|5360|127|2428|1202|234|2898|308|235|282
82
- Test|2377|5470|178|2334|1098|185|2782|273|354|355
83
 
84
  ## Training procedure
85
 
 
60
 
61
  Some of the entities (for instance WORK_OF_ART, LAW, MONEY) that have been annotated in the [Turku OntoNotes Entities Corpus](https://github.com/TurkuNLP/turku-one)
62
  dataset were filtered out from the dataset used for training the model. On the other hand, entities that were missing from the [NewsEye dataset](https://zenodo.org/record/4573313)
63
+ were added during the annotation process. The different data sources used in model training, validation and testing are listed below:
64
 
65
+ Dataset|Period covered by the texts|Text type|Percentage of the total data
66
  -|-|-|-
67
  [Turku OntoNotes Entities Corpus](https://github.com/TurkuNLP/turku-one)|2000s|Online texts|23%
68
  [NewsEye dataset](https://zenodo.org/record/4573313)|1850-1950|OCR'd digitized newspaper articles|3%
 
77
  ### Number of entity types in the data
78
  Dataset|PERSON|ORG|LOC|GPE|PRODUCT|EVENT|DATE|JON|FIBC|NORP
79
  -|-|-|-|-|-|-|-|-|-|-
80
+ Train|20211|45722|1321|19387|9571|1616|23642|2460|2384|2529
81
+ Val|2525|5517|130|2512|1217|240|3047|306|247|283
82
+ Test|2414|5577|179|2445|1097|183|2838|272|374|356
83
 
84
  ## Training procedure
85