starride-teklia
commited on
Commit
•
20bf3bf
1
Parent(s):
f2f6080
Add PyLaia model trained on NorHand v3 (#1)
Browse files- Add PyLaia model trained on NorHand v3 (748a8a0a6aee6fce2eb298c3d4bfbd78dbdd2c0a)
- README.md +41 -0
- language_model.arpa.gz +3 -0
- lexicon.txt +169 -0
- model +0 -0
- syms.txt +169 -0
- tokens.txt +169 -0
- weights.ckpt +3 -0
README.md
CHANGED
@@ -1,3 +1,44 @@
|
|
1 |
---
|
|
|
2 |
license: mit
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
3 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
+
library_name: PyLaia
|
3 |
license: mit
|
4 |
+
tags:
|
5 |
+
- PyLaia
|
6 |
+
- PyTorch
|
7 |
+
- Handwritten text recognition
|
8 |
+
metrics:
|
9 |
+
- CER
|
10 |
+
- WER
|
11 |
+
language:
|
12 |
+
- 'no'
|
13 |
---
|
14 |
+
|
15 |
+
# NorHand v3 handwritten text recognition
|
16 |
+
|
17 |
+
This model performs Handwritten Text Recognition in Norwegian on historical documents.
|
18 |
+
|
19 |
+
## Model description
|
20 |
+
|
21 |
+
The model was trained using the PyLaia library on the [NorHand v3 dataset](https://zenodo.org/records/10255840).
|
22 |
+
|
23 |
+
For training, text-lines were resized with a fixed height of 128 pixels, keeping the original aspect ratio. Vertical lines are discarded.
|
24 |
+
|
25 |
+
| split | N lines | N horizontal lines |
|
26 |
+
| ----- | ------: | -----------------: |
|
27 |
+
| train | 224,173 | 223,971 |
|
28 |
+
| val | 22,828 | 22,811 |
|
29 |
+
| test | 1,573 | 1,573 |
|
30 |
+
|
31 |
+
An external 6-gram character language model can be used to improve recognition. The language model is trained on the text from the NorHand v3 training set.
|
32 |
+
|
33 |
+
## Evaluation results
|
34 |
+
|
35 |
+
The model achieves the following results:
|
36 |
+
|
37 |
+
| set | Language model | CER (%) | WER (%) | N lines |
|
38 |
+
|:------|:---------------|:----------:|:-------:|----------:|
|
39 |
+
| test | no | 7.52 | 22.99 | 1,573 |
|
40 |
+
| test | yes | 6.36 | 18.11 | 1,573 |
|
41 |
+
|
42 |
+
## How to use
|
43 |
+
|
44 |
+
Please refer to the [documentation](https://atr.pages.teklia.com/pylaia/).
|
language_model.arpa.gz
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:43b6a1f11ec090c604566d86058ced36507510b1ec74fb9d1e0c11c5d10a0bae
|
3 |
+
size 54823896
|
lexicon.txt
ADDED
@@ -0,0 +1,169 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
<ctc> <ctc>
|
2 |
+
! !
|
3 |
+
" "
|
4 |
+
# #
|
5 |
+
$ $
|
6 |
+
% %
|
7 |
+
& &
|
8 |
+
' '
|
9 |
+
( (
|
10 |
+
) )
|
11 |
+
* *
|
12 |
+
+ +
|
13 |
+
, ,
|
14 |
+
- -
|
15 |
+
. .
|
16 |
+
/ /
|
17 |
+
0 0
|
18 |
+
1 1
|
19 |
+
2 2
|
20 |
+
3 3
|
21 |
+
4 4
|
22 |
+
5 5
|
23 |
+
6 6
|
24 |
+
7 7
|
25 |
+
8 8
|
26 |
+
9 9
|
27 |
+
: :
|
28 |
+
; ;
|
29 |
+
< <
|
30 |
+
= =
|
31 |
+
> >
|
32 |
+
? ?
|
33 |
+
A A
|
34 |
+
B B
|
35 |
+
C C
|
36 |
+
D D
|
37 |
+
E E
|
38 |
+
F F
|
39 |
+
G G
|
40 |
+
H H
|
41 |
+
I I
|
42 |
+
J J
|
43 |
+
K K
|
44 |
+
L L
|
45 |
+
M M
|
46 |
+
N N
|
47 |
+
O O
|
48 |
+
P P
|
49 |
+
Q Q
|
50 |
+
R R
|
51 |
+
S S
|
52 |
+
T T
|
53 |
+
U U
|
54 |
+
V V
|
55 |
+
W W
|
56 |
+
X X
|
57 |
+
Y Y
|
58 |
+
Z Z
|
59 |
+
[ [
|
60 |
+
\ \
|
61 |
+
] ]
|
62 |
+
_ _
|
63 |
+
` `
|
64 |
+
a a
|
65 |
+
b b
|
66 |
+
c c
|
67 |
+
d d
|
68 |
+
e e
|
69 |
+
f f
|
70 |
+
g g
|
71 |
+
h h
|
72 |
+
i i
|
73 |
+
j j
|
74 |
+
k k
|
75 |
+
l l
|
76 |
+
m m
|
77 |
+
n n
|
78 |
+
o o
|
79 |
+
p p
|
80 |
+
q q
|
81 |
+
r r
|
82 |
+
s s
|
83 |
+
t t
|
84 |
+
u u
|
85 |
+
v v
|
86 |
+
w w
|
87 |
+
x x
|
88 |
+
y y
|
89 |
+
z z
|
90 |
+
{ {
|
91 |
+
| |
|
92 |
+
} }
|
93 |
+
£ £
|
94 |
+
§ §
|
95 |
+
« «
|
96 |
+
¬ ¬
|
97 |
+
° °
|
98 |
+
´ ´
|
99 |
+
¹ ¹
|
100 |
+
º º
|
101 |
+
» »
|
102 |
+
¼ ¼
|
103 |
+
½ ½
|
104 |
+
¾ ¾
|
105 |
+
Á Á
|
106 |
+
Ä Ä
|
107 |
+
Å Å
|
108 |
+
Æ Æ
|
109 |
+
É É
|
110 |
+
Ö Ö
|
111 |
+
Ø Ø
|
112 |
+
Ü Ü
|
113 |
+
Þ Þ
|
114 |
+
ß ß
|
115 |
+
à à
|
116 |
+
á á
|
117 |
+
â â
|
118 |
+
ä ä
|
119 |
+
å å
|
120 |
+
æ æ
|
121 |
+
ç ç
|
122 |
+
è è
|
123 |
+
é é
|
124 |
+
ê ê
|
125 |
+
ë ë
|
126 |
+
í í
|
127 |
+
ï ï
|
128 |
+
ð ð
|
129 |
+
ñ ñ
|
130 |
+
ò ò
|
131 |
+
ó ó
|
132 |
+
ô ô
|
133 |
+
ö ö
|
134 |
+
÷ ÷
|
135 |
+
ø ø
|
136 |
+
ù ù
|
137 |
+
ú ú
|
138 |
+
û û
|
139 |
+
ü ü
|
140 |
+
ý ý
|
141 |
+
þ þ
|
142 |
+
œ œ
|
143 |
+
ɔ ɔ
|
144 |
+
ː ː
|
145 |
+
˚ ˚
|
146 |
+
̄ ̄
|
147 |
+
Ֆ Ֆ
|
148 |
+
ẞ ẞ
|
149 |
+
– –
|
150 |
+
— —
|
151 |
+
‘ ‘
|
152 |
+
’ ’
|
153 |
+
“ “
|
154 |
+
” ”
|
155 |
+
„ „
|
156 |
+
… …
|
157 |
+
⁄ ⁄
|
158 |
+
⁰ ⁰
|
159 |
+
⁴ ⁴
|
160 |
+
⁶ ⁶
|
161 |
+
₀ ₀
|
162 |
+
₁ ₁
|
163 |
+
₅ ₅
|
164 |
+
⅓ ⅓
|
165 |
+
⅛ ⅛
|
166 |
+
♂ ♂
|
167 |
+
|
168 |
+
<unk> <unk>
|
169 |
+
<space> <space>
|
model
ADDED
Binary file (1.52 kB). View file
|
|
syms.txt
ADDED
@@ -0,0 +1,169 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
<ctc> 0
|
2 |
+
! 1
|
3 |
+
" 2
|
4 |
+
# 3
|
5 |
+
$ 4
|
6 |
+
% 5
|
7 |
+
& 6
|
8 |
+
' 7
|
9 |
+
( 8
|
10 |
+
) 9
|
11 |
+
* 10
|
12 |
+
+ 11
|
13 |
+
, 12
|
14 |
+
- 13
|
15 |
+
. 14
|
16 |
+
/ 15
|
17 |
+
0 16
|
18 |
+
1 17
|
19 |
+
2 18
|
20 |
+
3 19
|
21 |
+
4 20
|
22 |
+
5 21
|
23 |
+
6 22
|
24 |
+
7 23
|
25 |
+
8 24
|
26 |
+
9 25
|
27 |
+
: 26
|
28 |
+
; 27
|
29 |
+
< 28
|
30 |
+
= 29
|
31 |
+
> 30
|
32 |
+
? 31
|
33 |
+
A 32
|
34 |
+
B 33
|
35 |
+
C 34
|
36 |
+
D 35
|
37 |
+
E 36
|
38 |
+
F 37
|
39 |
+
G 38
|
40 |
+
H 39
|
41 |
+
I 40
|
42 |
+
J 41
|
43 |
+
K 42
|
44 |
+
L 43
|
45 |
+
M 44
|
46 |
+
N 45
|
47 |
+
O 46
|
48 |
+
P 47
|
49 |
+
Q 48
|
50 |
+
R 49
|
51 |
+
S 50
|
52 |
+
T 51
|
53 |
+
U 52
|
54 |
+
V 53
|
55 |
+
W 54
|
56 |
+
X 55
|
57 |
+
Y 56
|
58 |
+
Z 57
|
59 |
+
[ 58
|
60 |
+
\ 59
|
61 |
+
] 60
|
62 |
+
_ 61
|
63 |
+
` 62
|
64 |
+
a 63
|
65 |
+
b 64
|
66 |
+
c 65
|
67 |
+
d 66
|
68 |
+
e 67
|
69 |
+
f 68
|
70 |
+
g 69
|
71 |
+
h 70
|
72 |
+
i 71
|
73 |
+
j 72
|
74 |
+
k 73
|
75 |
+
l 74
|
76 |
+
m 75
|
77 |
+
n 76
|
78 |
+
o 77
|
79 |
+
p 78
|
80 |
+
q 79
|
81 |
+
r 80
|
82 |
+
s 81
|
83 |
+
t 82
|
84 |
+
u 83
|
85 |
+
v 84
|
86 |
+
w 85
|
87 |
+
x 86
|
88 |
+
y 87
|
89 |
+
z 88
|
90 |
+
{ 89
|
91 |
+
| 90
|
92 |
+
} 91
|
93 |
+
£ 92
|
94 |
+
§ 93
|
95 |
+
« 94
|
96 |
+
¬ 95
|
97 |
+
° 96
|
98 |
+
´ 97
|
99 |
+
¹ 98
|
100 |
+
º 99
|
101 |
+
» 100
|
102 |
+
¼ 101
|
103 |
+
½ 102
|
104 |
+
¾ 103
|
105 |
+
Á 104
|
106 |
+
Ä 105
|
107 |
+
Å 106
|
108 |
+
Æ 107
|
109 |
+
É 108
|
110 |
+
Ö 109
|
111 |
+
Ø 110
|
112 |
+
Ü 111
|
113 |
+
Þ 112
|
114 |
+
ß 113
|
115 |
+
à 114
|
116 |
+
á 115
|
117 |
+
â 116
|
118 |
+
ä 117
|
119 |
+
å 118
|
120 |
+
æ 119
|
121 |
+
ç 120
|
122 |
+
è 121
|
123 |
+
é 122
|
124 |
+
ê 123
|
125 |
+
ë 124
|
126 |
+
í 125
|
127 |
+
ï 126
|
128 |
+
ð 127
|
129 |
+
ñ 128
|
130 |
+
ò 129
|
131 |
+
ó 130
|
132 |
+
ô 131
|
133 |
+
ö 132
|
134 |
+
÷ 133
|
135 |
+
ø 134
|
136 |
+
ù 135
|
137 |
+
ú 136
|
138 |
+
û 137
|
139 |
+
ü 138
|
140 |
+
ý 139
|
141 |
+
þ 140
|
142 |
+
œ 141
|
143 |
+
ɔ 142
|
144 |
+
ː 143
|
145 |
+
˚ 144
|
146 |
+
̄ 145
|
147 |
+
Ֆ 146
|
148 |
+
ẞ 147
|
149 |
+
– 148
|
150 |
+
— 149
|
151 |
+
‘ 150
|
152 |
+
’ 151
|
153 |
+
“ 152
|
154 |
+
” 153
|
155 |
+
„ 154
|
156 |
+
… 155
|
157 |
+
⁄ 156
|
158 |
+
⁰ 157
|
159 |
+
⁴ 158
|
160 |
+
⁶ 159
|
161 |
+
₀ 160
|
162 |
+
₁ 161
|
163 |
+
₅ 162
|
164 |
+
⅓ 163
|
165 |
+
⅛ 164
|
166 |
+
♂ 165
|
167 |
+
166
|
168 |
+
<unk> 167
|
169 |
+
<space> 168
|
tokens.txt
ADDED
@@ -0,0 +1,169 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
<ctc>
|
2 |
+
!
|
3 |
+
"
|
4 |
+
#
|
5 |
+
$
|
6 |
+
%
|
7 |
+
&
|
8 |
+
'
|
9 |
+
(
|
10 |
+
)
|
11 |
+
*
|
12 |
+
+
|
13 |
+
,
|
14 |
+
-
|
15 |
+
.
|
16 |
+
/
|
17 |
+
0
|
18 |
+
1
|
19 |
+
2
|
20 |
+
3
|
21 |
+
4
|
22 |
+
5
|
23 |
+
6
|
24 |
+
7
|
25 |
+
8
|
26 |
+
9
|
27 |
+
:
|
28 |
+
;
|
29 |
+
<
|
30 |
+
=
|
31 |
+
>
|
32 |
+
?
|
33 |
+
A
|
34 |
+
B
|
35 |
+
C
|
36 |
+
D
|
37 |
+
E
|
38 |
+
F
|
39 |
+
G
|
40 |
+
H
|
41 |
+
I
|
42 |
+
J
|
43 |
+
K
|
44 |
+
L
|
45 |
+
M
|
46 |
+
N
|
47 |
+
O
|
48 |
+
P
|
49 |
+
Q
|
50 |
+
R
|
51 |
+
S
|
52 |
+
T
|
53 |
+
U
|
54 |
+
V
|
55 |
+
W
|
56 |
+
X
|
57 |
+
Y
|
58 |
+
Z
|
59 |
+
[
|
60 |
+
\
|
61 |
+
]
|
62 |
+
_
|
63 |
+
`
|
64 |
+
a
|
65 |
+
b
|
66 |
+
c
|
67 |
+
d
|
68 |
+
e
|
69 |
+
f
|
70 |
+
g
|
71 |
+
h
|
72 |
+
i
|
73 |
+
j
|
74 |
+
k
|
75 |
+
l
|
76 |
+
m
|
77 |
+
n
|
78 |
+
o
|
79 |
+
p
|
80 |
+
q
|
81 |
+
r
|
82 |
+
s
|
83 |
+
t
|
84 |
+
u
|
85 |
+
v
|
86 |
+
w
|
87 |
+
x
|
88 |
+
y
|
89 |
+
z
|
90 |
+
{
|
91 |
+
|
|
92 |
+
}
|
93 |
+
£
|
94 |
+
§
|
95 |
+
«
|
96 |
+
¬
|
97 |
+
°
|
98 |
+
´
|
99 |
+
¹
|
100 |
+
º
|
101 |
+
»
|
102 |
+
¼
|
103 |
+
½
|
104 |
+
¾
|
105 |
+
Á
|
106 |
+
Ä
|
107 |
+
Å
|
108 |
+
Æ
|
109 |
+
É
|
110 |
+
Ö
|
111 |
+
Ø
|
112 |
+
Ü
|
113 |
+
Þ
|
114 |
+
ß
|
115 |
+
à
|
116 |
+
á
|
117 |
+
â
|
118 |
+
ä
|
119 |
+
å
|
120 |
+
æ
|
121 |
+
ç
|
122 |
+
è
|
123 |
+
é
|
124 |
+
ê
|
125 |
+
ë
|
126 |
+
í
|
127 |
+
ï
|
128 |
+
ð
|
129 |
+
ñ
|
130 |
+
ò
|
131 |
+
ó
|
132 |
+
ô
|
133 |
+
ö
|
134 |
+
÷
|
135 |
+
ø
|
136 |
+
ù
|
137 |
+
ú
|
138 |
+
û
|
139 |
+
ü
|
140 |
+
ý
|
141 |
+
þ
|
142 |
+
œ
|
143 |
+
ɔ
|
144 |
+
ː
|
145 |
+
˚
|
146 |
+
̄
|
147 |
+
Ֆ
|
148 |
+
ẞ
|
149 |
+
–
|
150 |
+
—
|
151 |
+
‘
|
152 |
+
’
|
153 |
+
“
|
154 |
+
”
|
155 |
+
„
|
156 |
+
…
|
157 |
+
⁄
|
158 |
+
⁰
|
159 |
+
⁴
|
160 |
+
⁶
|
161 |
+
₀
|
162 |
+
₁
|
163 |
+
₅
|
164 |
+
⅓
|
165 |
+
⅛
|
166 |
+
♂
|
167 |
+
|
168 |
+
<unk>
|
169 |
+
<space>
|
weights.ckpt
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:4e49b4233c9bea4cfbe9a35e239e1c25421ceed9b194b07f6ec3d96e1b1a69fc
|
3 |
+
size 43033372
|