1-800-BAD-CODE
commited on
Commit
•
c96c930
1
Parent(s):
38ea57a
Update README.md
Browse files
README.md
CHANGED
@@ -173,8 +173,30 @@ This is also a base-sized model with many languages and many tasks, so capacity
|
|
173 |
|
174 |
# Evaluation
|
175 |
In these metrics, keep in mind that
|
176 |
-
1.
|
177 |
-
2. Sentence boundaries and true-casing
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
178 |
|
179 |
<details>
|
180 |
<summary>English</summary>
|
@@ -182,10 +204,10 @@ In these metrics, keep in mind that
|
|
182 |
```
|
183 |
punct_post test report:
|
184 |
label precision recall f1 support
|
185 |
-
<NULL> (label_id: 0) 98.71 98.
|
186 |
-
. (label_id: 1) 87.
|
187 |
-
, (label_id: 2)
|
188 |
-
? (label_id: 3)
|
189 |
? (label_id: 4) 0.00 0.00 0.00 0
|
190 |
, (label_id: 5) 0.00 0.00 0.00 0
|
191 |
。 (label_id: 6) 0.00 0.00 0.00 0
|
@@ -199,26 +221,225 @@ punct_post test report:
|
|
199 |
፣ (label_id: 14) 0.00 0.00 0.00 0
|
200 |
፧ (label_id: 15) 0.00 0.00 0.00 0
|
201 |
-------------------
|
202 |
-
micro avg 97.
|
203 |
-
macro avg 83.
|
204 |
-
weighted avg 97.
|
205 |
|
206 |
cap test report:
|
207 |
label precision recall f1 support
|
208 |
-
LOWER (label_id: 0) 99.
|
209 |
-
UPPER (label_id: 1) 89.
|
210 |
-------------------
|
211 |
-
micro avg 99.15 99.15 99.15
|
212 |
-
macro avg 94.
|
213 |
-
weighted avg 99.17 99.15 99.16
|
214 |
|
215 |
seg test report:
|
216 |
label precision recall f1 support
|
217 |
-
NOSTOP (label_id: 0) 99.
|
218 |
-
FULLSTOP (label_id: 1) 89.
|
219 |
-------------------
|
220 |
-
micro avg 98.
|
221 |
-
macro avg 94.
|
222 |
-
weighted avg 98.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
223 |
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
224 |
</details>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
173 |
|
174 |
# Evaluation
|
175 |
In these metrics, keep in mind that
|
176 |
+
1. The data is noisy
|
177 |
+
2. Sentence boundaries and true-casing are conditioned on predicted punctuation, which is the most difficult task and sometimes incorrect.
|
178 |
+
When conditioning on reference punctuation, true-casing and SBD is practically 100% for most languages.
|
179 |
+
4. Punctuation can be subjective. E.g.,
|
180 |
+
|
181 |
+
`Hola mundo, ¿cómo estás?`
|
182 |
+
|
183 |
+
or
|
184 |
+
|
185 |
+
`Hola mundo. ¿Cómo estás?`
|
186 |
+
|
187 |
+
When the sentences are longer and more practical, these ambiguities abound and affect all 3 analytics.
|
188 |
+
|
189 |
+
|
190 |
+
## Selected Language Evaluation Reports
|
191 |
+
Each test example was generated using the following procedure:
|
192 |
+
|
193 |
+
1. Concatenate 5 random sentences
|
194 |
+
2. Lower-case the concatenated sentence
|
195 |
+
3. Remove all punctuation
|
196 |
+
|
197 |
+
The data is a held-out portion of News Crawl, which has been deduplicated.
|
198 |
+
2,000 lines of data per language was used, generating 2,000 unique examples of 5 sentences each.
|
199 |
+
The last 4 sentences of each example were randomly sampled from the 2,000 and may be duplicated.
|
200 |
|
201 |
<details>
|
202 |
<summary>English</summary>
|
|
|
204 |
```
|
205 |
punct_post test report:
|
206 |
label precision recall f1 support
|
207 |
+
<NULL> (label_id: 0) 98.71 98.66 98.68 156605
|
208 |
+
. (label_id: 1) 87.72 88.85 88.28 8752
|
209 |
+
, (label_id: 2) 68.06 67.81 67.93 5216
|
210 |
+
? (label_id: 3) 79.38 77.20 78.27 693
|
211 |
? (label_id: 4) 0.00 0.00 0.00 0
|
212 |
, (label_id: 5) 0.00 0.00 0.00 0
|
213 |
。 (label_id: 6) 0.00 0.00 0.00 0
|
|
|
221 |
፣ (label_id: 14) 0.00 0.00 0.00 0
|
222 |
፧ (label_id: 15) 0.00 0.00 0.00 0
|
223 |
-------------------
|
224 |
+
micro avg 97.13 97.13 97.13 171266
|
225 |
+
macro avg 83.46 83.13 83.29 171266
|
226 |
+
weighted avg 97.13 97.13 97.13 171266
|
227 |
|
228 |
cap test report:
|
229 |
label precision recall f1 support
|
230 |
+
LOWER (label_id: 0) 99.63 99.49 99.56 526612
|
231 |
+
UPPER (label_id: 1) 89.19 91.84 90.50 24161
|
232 |
-------------------
|
233 |
+
micro avg 99.15 99.15 99.15 550773
|
234 |
+
macro avg 94.41 95.66 95.03 550773
|
235 |
+
weighted avg 99.17 99.15 99.16 550773
|
236 |
|
237 |
seg test report:
|
238 |
label precision recall f1 support
|
239 |
+
NOSTOP (label_id: 0) 99.37 99.42 99.39 162044
|
240 |
+
FULLSTOP (label_id: 1) 89.75 88.84 89.29 9222
|
241 |
-------------------
|
242 |
+
micro avg 98.85 98.85 98.85 171266
|
243 |
+
macro avg 94.56 94.13 94.34 171266
|
244 |
+
weighted avg 98.85 98.85 98.85 171266
|
245 |
+
```
|
246 |
+
</details>
|
247 |
+
|
248 |
+
|
249 |
+
<details>
|
250 |
+
<summary>Spanish</summary>
|
251 |
+
|
252 |
```
|
253 |
+
punct_pre test report:
|
254 |
+
label precision recall f1 support
|
255 |
+
<NULL> (label_id: 0) 99.94 99.92 99.93 185535
|
256 |
+
¿ (label_id: 1) 55.01 64.86 59.53 296
|
257 |
+
-------------------
|
258 |
+
micro avg 99.86 99.86 99.86 185831
|
259 |
+
macro avg 77.48 82.39 79.73 185831
|
260 |
+
weighted avg 99.87 99.86 99.87 185831
|
261 |
+
|
262 |
+
punct_post test report:
|
263 |
+
label precision recall f1 support
|
264 |
+
<NULL> (label_id: 0) 98.74 98.86 98.80 170282
|
265 |
+
. (label_id: 1) 90.07 89.58 89.82 9959
|
266 |
+
, (label_id: 2) 68.33 67.00 67.66 5300
|
267 |
+
? (label_id: 3) 70.25 58.62 63.91 290
|
268 |
+
? (label_id: 4) 0.00 0.00 0.00 0
|
269 |
+
, (label_id: 5) 0.00 0.00 0.00 0
|
270 |
+
。 (label_id: 6) 0.00 0.00 0.00 0
|
271 |
+
、 (label_id: 7) 0.00 0.00 0.00 0
|
272 |
+
・ (label_id: 8) 0.00 0.00 0.00 0
|
273 |
+
। (label_id: 9) 0.00 0.00 0.00 0
|
274 |
+
؟ (label_id: 10) 0.00 0.00 0.00 0
|
275 |
+
، (label_id: 11) 0.00 0.00 0.00 0
|
276 |
+
; (label_id: 12) 0.00 0.00 0.00 0
|
277 |
+
። (label_id: 13) 0.00 0.00 0.00 0
|
278 |
+
፣ (label_id: 14) 0.00 0.00 0.00 0
|
279 |
+
፧ (label_id: 15) 0.00 0.00 0.00 0
|
280 |
+
-------------------
|
281 |
+
micro avg 97.39 97.39 97.39 185831
|
282 |
+
macro avg 81.84 78.51 80.05 185831
|
283 |
+
weighted avg 97.36 97.39 97.37 185831
|
284 |
+
|
285 |
+
cap test report:
|
286 |
+
label precision recall f1 support
|
287 |
+
LOWER (label_id: 0) 99.62 99.60 99.61 555041
|
288 |
+
UPPER (label_id: 1) 90.60 91.06 90.83 23538
|
289 |
+
-------------------
|
290 |
+
micro avg 99.25 99.25 99.25 578579
|
291 |
+
macro avg 95.11 95.33 95.22 578579
|
292 |
+
weighted avg 99.25 99.25 99.25 578579
|
293 |
+
|
294 |
+
[NeMo I 2023-02-22 17:24:04 punct_cap_seg_model:427] seg test report:
|
295 |
+
label precision recall f1 support
|
296 |
+
NOSTOP (label_id: 0) 99.44 99.54 99.49 175908
|
297 |
+
FULLSTOP (label_id: 1) 91.68 89.98 90.82 9923
|
298 |
+
-------------------
|
299 |
+
micro avg 99.03 99.03 99.03 185831
|
300 |
+
macro avg 95.56 94.76 95.16 185831
|
301 |
+
weighted avg 99.02 99.03 99.02 185831
|
302 |
+
```
|
303 |
</details>
|
304 |
+
|
305 |
+
<details>
|
306 |
+
<summary>Chinese</summary>
|
307 |
+
|
308 |
+
```
|
309 |
+
punct_post test report:
|
310 |
+
label precision recall f1 support
|
311 |
+
<NULL> (label_id: 0) 98.82 97.34 98.07 147920
|
312 |
+
. (label_id: 1) 0.00 0.00 0.00 0
|
313 |
+
, (label_id: 2) 0.00 0.00 0.00 0
|
314 |
+
? (label_id: 3) 0.00 0.00 0.00 0
|
315 |
+
? (label_id: 4) 85.77 80.71 83.16 560
|
316 |
+
, (label_id: 5) 59.88 78.02 67.75 6901
|
317 |
+
。 (label_id: 6) 92.50 93.92 93.20 10988
|
318 |
+
、 (label_id: 7) 0.00 0.00 0.00 0
|
319 |
+
・ (label_id: 8) 0.00 0.00 0.00 0
|
320 |
+
। (label_id: 9) 0.00 0.00 0.00 0
|
321 |
+
؟ (label_id: 10) 0.00 0.00 0.00 0
|
322 |
+
، (label_id: 11) 0.00 0.00 0.00 0
|
323 |
+
; (label_id: 12) 0.00 0.00 0.00 0
|
324 |
+
። (label_id: 13) 0.00 0.00 0.00 0
|
325 |
+
፣ (label_id: 14) 0.00 0.00 0.00 0
|
326 |
+
፧ (label_id: 15) 0.00 0.00 0.00 0
|
327 |
+
-------------------
|
328 |
+
micro avg 96.25 96.25 96.25 166369
|
329 |
+
macro avg 84.24 87.50 85.55 166369
|
330 |
+
weighted avg 96.75 96.25 96.45 166369
|
331 |
+
|
332 |
+
cap test report:
|
333 |
+
label precision recall f1 support
|
334 |
+
LOWER (label_id: 0) 97.07 92.39 94.67 394
|
335 |
+
UPPER (label_id: 1) 70.59 86.75 77.84 83
|
336 |
+
-------------------
|
337 |
+
micro avg 91.40 91.40 91.40 477
|
338 |
+
macro avg 83.83 89.57 86.25 477
|
339 |
+
weighted avg 92.46 91.40 91.74 477
|
340 |
+
|
341 |
+
seg test report:
|
342 |
+
label precision recall f1 support
|
343 |
+
NOSTOP (label_id: 0) 99.58 99.53 99.56 156369
|
344 |
+
FULLSTOP (label_id: 1) 92.77 93.50 93.13 10000
|
345 |
+
-------------------
|
346 |
+
micro avg 99.17 99.17 99.17 166369
|
347 |
+
macro avg 96.18 96.52 96.35 166369
|
348 |
+
weighted avg 99.17 99.17 99.17 166369
|
349 |
+
```
|
350 |
+
</details>
|
351 |
+
|
352 |
+
|
353 |
+
<details>
|
354 |
+
<summary>Hindi</summary>
|
355 |
+
|
356 |
+
```
|
357 |
+
punct_post test report:
|
358 |
+
label precision recall f1 support
|
359 |
+
<NULL> (label_id: 0) 99.58 99.59 99.59 176743
|
360 |
+
. (label_id: 1) 0.00 0.00 0.00 0
|
361 |
+
, (label_id: 2) 68.32 65.23 66.74 1815
|
362 |
+
? (label_id: 3) 60.27 44.90 51.46 98
|
363 |
+
? (label_id: 4) 0.00 0.00 0.00 0
|
364 |
+
, (label_id: 5) 0.00 0.00 0.00 0
|
365 |
+
。 (label_id: 6) 0.00 0.00 0.00 0
|
366 |
+
、 (label_id: 7) 0.00 0.00 0.00 0
|
367 |
+
・ (label_id: 8) 0.00 0.00 0.00 0
|
368 |
+
। (label_id: 9) 96.45 97.43 96.94 10136
|
369 |
+
؟ (label_id: 10) 0.00 0.00 0.00 0
|
370 |
+
، (label_id: 11) 0.00 0.00 0.00 0
|
371 |
+
; (label_id: 12) 0.00 0.00 0.00 0
|
372 |
+
። (label_id: 13) 0.00 0.00 0.00 0
|
373 |
+
፣ (label_id: 14) 0.00 0.00 0.00 0
|
374 |
+
፧ (label_id: 15) 0.00 0.00 0.00 0
|
375 |
+
-------------------
|
376 |
+
micro avg 99.11 99.11 99.11 188792
|
377 |
+
macro avg 81.16 76.79 78.68 188792
|
378 |
+
weighted avg 99.10 99.11 99.10 188792
|
379 |
+
|
380 |
+
cap test report:
|
381 |
+
label precision recall f1 support
|
382 |
+
LOWER (label_id: 0) 98.25 95.06 96.63 708
|
383 |
+
UPPER (label_id: 1) 89.46 96.12 92.67 309
|
384 |
+
-------------------
|
385 |
+
micro avg 95.38 95.38 95.38 1017
|
386 |
+
macro avg 93.85 95.59 94.65 1017
|
387 |
+
weighted avg 95.58 95.38 95.42 1017
|
388 |
+
|
389 |
+
seg test report:
|
390 |
+
label precision recall f1 support
|
391 |
+
NOSTOP (label_id: 0) 99.87 99.85 99.86 178892
|
392 |
+
FULLSTOP (label_id: 1) 97.38 97.58 97.48 9900
|
393 |
+
-------------------
|
394 |
+
micro avg 99.74 99.74 99.74 188792
|
395 |
+
macro avg 98.62 98.72 98.67 188792
|
396 |
+
weighted avg 99.74 99.74 99.74 188792
|
397 |
+
```
|
398 |
+
</details>
|
399 |
+
|
400 |
+
<details>
|
401 |
+
<summary>Amharic</summary>
|
402 |
+
|
403 |
+
```
|
404 |
+
punct_post test report:
|
405 |
+
label precision recall f1 support
|
406 |
+
<NULL> (label_id: 0) 99.58 99.42 99.50 236298
|
407 |
+
. (label_id: 1) 0.00 0.00 0.00 0
|
408 |
+
, (label_id: 2) 0.00 0.00 0.00 0
|
409 |
+
? (label_id: 3) 0.00 0.00 0.00 0
|
410 |
+
? (label_id: 4) 0.00 0.00 0.00 0
|
411 |
+
, (label_id: 5) 0.00 0.00 0.00 0
|
412 |
+
。 (label_id: 6) 0.00 0.00 0.00 0
|
413 |
+
、 (label_id: 7) 0.00 0.00 0.00 0
|
414 |
+
・ (label_id: 8) 0.00 0.00 0.00 0
|
415 |
+
। (label_id: 9) 0.00 0.00 0.00 0
|
416 |
+
؟ (label_id: 10) 0.00 0.00 0.00 0
|
417 |
+
، (label_id: 11) 0.00 0.00 0.00 0
|
418 |
+
; (label_id: 12) 0.00 0.00 0.00 0
|
419 |
+
። (label_id: 13) 89.79 95.24 92.44 9169
|
420 |
+
፣ (label_id: 14) 66.85 56.58 61.29 1504
|
421 |
+
፧ (label_id: 15) 67.67 83.72 74.84 215
|
422 |
+
-------------------
|
423 |
+
micro avg 98.99 98.99 98.99 247186
|
424 |
+
macro avg 80.97 83.74 82.02 247186
|
425 |
+
weighted avg 98.99 98.99 98.98 247186
|
426 |
+
|
427 |
+
cap test report:
|
428 |
+
label precision recall f1 support
|
429 |
+
LOWER (label_id: 0) 96.65 99.78 98.19 1360
|
430 |
+
UPPER (label_id: 1) 98.90 85.13 91.50 316
|
431 |
+
-------------------
|
432 |
+
micro avg 97.02 97.02 97.02 1676
|
433 |
+
macro avg 97.77 92.45 94.84 1676
|
434 |
+
weighted avg 97.08 97.02 96.93 1676
|
435 |
+
|
436 |
+
seg test report:
|
437 |
+
label precision recall f1 support
|
438 |
+
NOSTOP (label_id: 0) 99.85 99.74 99.80 239845
|
439 |
+
FULLSTOP (label_id: 1) 91.72 95.25 93.45 7341
|
440 |
+
-------------------
|
441 |
+
micro avg 99.60 99.60 99.60 247186
|
442 |
+
macro avg 95.79 97.49 96.62 247186
|
443 |
+
weighted avg 99.61 99.60 99.61 247186
|
444 |
+
```
|
445 |
+
</details>
|