1-800-BAD-CODE
/

punct_cap_seg_47_language

@@ -173,8 +173,30 @@ This is also a base-sized model with many languages and many tasks, so capacity
 # Evaluation
 In these metrics, keep in mind that
-1. That data is noisy
-2. Sentence boundaries and true-casing is conditioned on predicted punctuation
 <details>
   <summary>English</summary>
@@ -182,10 +204,10 @@ In these metrics, keep in mind that
   ```
 punct_post test report:
     label                                                precision    recall       f1           support
-    <NULL> (label_id: 0)                                    98.71      98.69      98.70     107750
-    . (label_id: 1)                                         87.82      88.89      88.36       6005
-    , (label_id: 2)                                         67.90      67.24      67.57       3571
-    ? (label_id: 3)                                         80.51      78.19      79.33        486
     ？ (label_id: 4)                                          0.00       0.00       0.00          0
     ， (label_id: 5)                                          0.00       0.00       0.00          0
     。 (label_id: 6)                                          0.00       0.00       0.00          0
@@ -199,26 +221,225 @@ punct_post test report:
     ፣ (label_id: 14)                                         0.00       0.00       0.00          0
     ፧ (label_id: 15)                                         0.00       0.00       0.00          0
     -------------------
-    micro avg                                               97.15      97.15      97.15     117812
-    macro avg                                               83.74      83.25      83.49     117812
-    weighted avg                                            97.15      97.15      97.15     117812
 cap test report:
     label                                                precision    recall       f1           support
-    LOWER (label_id: 0)                                     99.62      99.49      99.56     362399
-    UPPER (label_id: 1)                                     89.11      91.75      90.41      16506
     -------------------
-    micro avg                                               99.15      99.15      99.15     378905
-    macro avg                                               94.37      95.62      94.98     378905
-    weighted avg                                            99.17      99.15      99.16     378905
 seg test report:
     label                                                precision    recall       f1           support
-    NOSTOP (label_id: 0)                                    99.29      99.43      99.36     111466
-    FULLSTOP (label_id: 1)                                  89.69      87.49      88.58       6346
     -------------------
-    micro avg                                               98.78      98.78      98.78     117812
-    macro avg                                               94.49      93.46      93.97     117812
-    weighted avg                                            98.77      98.78      98.78     117812
   ```
 </details>

 # Evaluation
 In these metrics, keep in mind that
+1. The data is noisy
+2. Sentence boundaries and true-casing are conditioned on predicted punctuation, which is the most difficult task and sometimes incorrect.
+   When conditioning on reference punctuation, true-casing and SBD is practically 100% for most languages.
+4. Punctuation can be subjective. E.g.,
+   `Hola mundo, ¿cómo estás?`
+   or
+   `Hola mundo. ¿Cómo estás?`
+   When the sentences are longer and more practical, these ambiguities abound and affect all 3 analytics.
+## Selected Language Evaluation Reports
+Each test example was generated using the following procedure:
+1. Concatenate 5 random sentences
+2. Lower-case the concatenated sentence
+3. Remove all punctuation
+The data is a held-out portion of News Crawl, which has been deduplicated.
+2,000 lines of data per language was used, generating 2,000 unique examples of 5 sentences each.
+The last 4 sentences of each example were randomly sampled from the 2,000 and may be duplicated.
 <details>
   <summary>English</summary>
   ```
 punct_post test report:
     label                                                precision    recall       f1           support
+    <NULL> (label_id: 0)                                    98.71      98.66      98.68     156605
+    . (label_id: 1)                                         87.72      88.85      88.28       8752
+    , (label_id: 2)                                         68.06      67.81      67.93       5216
+    ? (label_id: 3)                                         79.38      77.20      78.27        693
     ？ (label_id: 4)                                          0.00       0.00       0.00          0
     ， (label_id: 5)                                          0.00       0.00       0.00          0
     。 (label_id: 6)                                          0.00       0.00       0.00          0
     ፣ (label_id: 14)                                         0.00       0.00       0.00          0
     ፧ (label_id: 15)                                         0.00       0.00       0.00          0
     -------------------
+    micro avg                                               97.13      97.13      97.13     171266
+    macro avg                                               83.46      83.13      83.29     171266
+    weighted avg                                            97.13      97.13      97.13     171266
 cap test report:
     label                                                precision    recall       f1           support
+    LOWER (label_id: 0)                                     99.63      99.49      99.56     526612
+    UPPER (label_id: 1)                                     89.19      91.84      90.50      24161
     -------------------
+    micro avg                                               99.15      99.15      99.15     550773
+    macro avg                                               94.41      95.66      95.03     550773
+    weighted avg                                            99.17      99.15      99.16     550773
 seg test report:
     label                                                precision    recall       f1           support
+    NOSTOP (label_id: 0)                                    99.37      99.42      99.39     162044
+    FULLSTOP (label_id: 1)                                  89.75      88.84      89.29       9222
     -------------------
+    micro avg                                               98.85      98.85      98.85     171266
+    macro avg                                               94.56      94.13      94.34     171266
+    weighted avg                                            98.85      98.85      98.85     171266
+  ```
+</details>
+<details>
+  <summary>Spanish</summary>
   ```
+ punct_pre test report:
+    label                                                precision    recall       f1           support
+    <NULL> (label_id: 0)                                    99.94      99.92      99.93     185535
+    ¿ (label_id: 1)                                         55.01      64.86      59.53        296
+    -------------------
+    micro avg                                               99.86      99.86      99.86     185831
+    macro avg                                               77.48      82.39      79.73     185831
+    weighted avg                                            99.87      99.86      99.87     185831
+punct_post test report:
+    label                                                precision    recall       f1           support
+    <NULL> (label_id: 0)                                    98.74      98.86      98.80     170282
+    . (label_id: 1)                                         90.07      89.58      89.82       9959
+    , (label_id: 2)                                         68.33      67.00      67.66       5300
+    ? (label_id: 3)                                         70.25      58.62      63.91        290
+    ？ (label_id: 4)                                          0.00       0.00       0.00          0
+    ， (label_id: 5)                                          0.00       0.00       0.00          0
+    。 (label_id: 6)                                          0.00       0.00       0.00          0
+    、 (label_id: 7)                                          0.00       0.00       0.00          0
+    ・ (label_id: 8)                                          0.00       0.00       0.00          0
+    । (label_id: 9)                                          0.00       0.00       0.00          0
+    ؟ (label_id: 10)                                         0.00       0.00       0.00          0
+    ، (label_id: 11)                                         0.00       0.00       0.00          0
+    ; (label_id: 12)                                         0.00       0.00       0.00          0
+    ። (label_id: 13)                                         0.00       0.00       0.00          0
+    ፣ (label_id: 14)                                         0.00       0.00       0.00          0
+    ፧ (label_id: 15)                                         0.00       0.00       0.00          0
+    -------------------
+    micro avg                                               97.39      97.39      97.39     185831
+    macro avg                                               81.84      78.51      80.05     185831
+    weighted avg                                            97.36      97.39      97.37     185831
+cap test report:
+    label                                                precision    recall       f1           support
+    LOWER (label_id: 0)                                     99.62      99.60      99.61     555041
+    UPPER (label_id: 1)                                     90.60      91.06      90.83      23538
+    -------------------
+    micro avg                                               99.25      99.25      99.25     578579
+    macro avg                                               95.11      95.33      95.22     578579
+    weighted avg                                            99.25      99.25      99.25     578579
+[NeMo I 2023-02-22 17:24:04 punct_cap_seg_model:427] seg test report:
+    label                                                precision    recall       f1           support
+    NOSTOP (label_id: 0)                                    99.44      99.54      99.49     175908
+    FULLSTOP (label_id: 1)                                  91.68      89.98      90.82       9923
+    -------------------
+    micro avg                                               99.03      99.03      99.03     185831
+    macro avg                                               95.56      94.76      95.16     185831
+    weighted avg                                            99.02      99.03      99.02     185831
+```
 </details>
+<details>
+  <summary>Chinese</summary>
+```
+punct_post test report:
+    label                                                precision    recall       f1           support
+    <NULL> (label_id: 0)                                    98.82      97.34      98.07     147920
+    . (label_id: 1)                                          0.00       0.00       0.00          0
+    , (label_id: 2)                                          0.00       0.00       0.00          0
+    ? (label_id: 3)                                          0.00       0.00       0.00          0
+    ？ (label_id: 4)                                         85.77      80.71      83.16        560
+    ， (label_id: 5)                                         59.88      78.02      67.75       6901
+    。 (label_id: 6)                                         92.50      93.92      93.20      10988
+    、 (label_id: 7)                                          0.00       0.00       0.00          0
+    ・ (label_id: 8)                                          0.00       0.00       0.00          0
+    । (label_id: 9)                                          0.00       0.00       0.00          0
+    ؟ (label_id: 10)                                         0.00       0.00       0.00          0
+    ، (label_id: 11)                                         0.00       0.00       0.00          0
+    ; (label_id: 12)                                         0.00       0.00       0.00          0
+    ። (label_id: 13)                                         0.00       0.00       0.00          0
+    ፣ (label_id: 14)                                         0.00       0.00       0.00          0
+    ፧ (label_id: 15)                                         0.00       0.00       0.00          0
+    -------------------
+    micro avg                                               96.25      96.25      96.25     166369
+    macro avg                                               84.24      87.50      85.55     166369
+    weighted avg                                            96.75      96.25      96.45     166369
+cap test report:
+    label                                                precision    recall       f1           support
+    LOWER (label_id: 0)                                     97.07      92.39      94.67        394
+    UPPER (label_id: 1)                                     70.59      86.75      77.84         83
+    -------------------
+    micro avg                                               91.40      91.40      91.40        477
+    macro avg                                               83.83      89.57      86.25        477
+    weighted avg                                            92.46      91.40      91.74        477
+seg test report:
+    label                                                precision    recall       f1           support
+    NOSTOP (label_id: 0)                                    99.58      99.53      99.56     156369
+    FULLSTOP (label_id: 1)                                  92.77      93.50      93.13      10000
+    -------------------
+    micro avg                                               99.17      99.17      99.17     166369
+    macro avg                                               96.18      96.52      96.35     166369
+    weighted avg                                            99.17      99.17      99.17     166369
+```
+</details>
+<details>
+  <summary>Hindi</summary>
+```
+punct_post test report:
+    label                                                precision    recall       f1           support
+    <NULL> (label_id: 0)                                    99.58      99.59      99.59     176743
+    . (label_id: 1)                                          0.00       0.00       0.00          0
+    , (label_id: 2)                                         68.32      65.23      66.74       1815
+    ? (label_id: 3)                                         60.27      44.90      51.46         98
+    ？ (label_id: 4)                                          0.00       0.00       0.00          0
+    ， (label_id: 5)                                          0.00       0.00       0.00          0
+    。 (label_id: 6)                                          0.00       0.00       0.00          0
+    、 (label_id: 7)                                          0.00       0.00       0.00          0
+    ・ (label_id: 8)                                          0.00       0.00       0.00          0
+    । (label_id: 9)                                         96.45      97.43      96.94      10136
+    ؟ (label_id: 10)                                         0.00       0.00       0.00          0
+    ، (label_id: 11)                                         0.00       0.00       0.00          0
+    ; (label_id: 12)                                         0.00       0.00       0.00          0
+    ። (label_id: 13)                                         0.00       0.00       0.00          0
+    ፣ (label_id: 14)                                         0.00       0.00       0.00          0
+    ፧ (label_id: 15)                                         0.00       0.00       0.00          0
+    -------------------
+    micro avg                                               99.11      99.11      99.11     188792
+    macro avg                                               81.16      76.79      78.68     188792
+    weighted avg                                            99.10      99.11      99.10     188792
+cap test report:
+    label                                                precision    recall       f1           support
+    LOWER (label_id: 0)                                     98.25      95.06      96.63        708
+    UPPER (label_id: 1)                                     89.46      96.12      92.67        309
+    -------------------
+    micro avg                                               95.38      95.38      95.38       1017
+    macro avg                                               93.85      95.59      94.65       1017
+    weighted avg                                            95.58      95.38      95.42       1017
+seg test report:
+    label                                                precision    recall       f1           support
+    NOSTOP (label_id: 0)                                    99.87      99.85      99.86     178892
+    FULLSTOP (label_id: 1)                                  97.38      97.58      97.48       9900
+    -------------------
+    micro avg                                               99.74      99.74      99.74     188792
+    macro avg                                               98.62      98.72      98.67     188792
+    weighted avg                                            99.74      99.74      99.74     188792
+```
+</details>
+<details>
+  <summary>Amharic</summary>
+```
+punct_post test report:
+    label                                                precision    recall       f1           support
+    <NULL> (label_id: 0)                                    99.58      99.42      99.50     236298
+    . (label_id: 1)                                          0.00       0.00       0.00          0
+    , (label_id: 2)                                          0.00       0.00       0.00          0
+    ? (label_id: 3)                                          0.00       0.00       0.00          0
+    ？ (label_id: 4)                                          0.00       0.00       0.00          0
+    ， (label_id: 5)                                          0.00       0.00       0.00          0
+    。 (label_id: 6)                                          0.00       0.00       0.00          0
+    、 (label_id: 7)                                          0.00       0.00       0.00          0
+    ・ (label_id: 8)                                          0.00       0.00       0.00          0
+    । (label_id: 9)                                          0.00       0.00       0.00          0
+    ؟ (label_id: 10)                                         0.00       0.00       0.00          0
+    ، (label_id: 11)                                         0.00       0.00       0.00          0
+    ; (label_id: 12)                                         0.00       0.00       0.00          0
+    ። (label_id: 13)                                        89.79      95.24      92.44       9169
+    ፣ (label_id: 14)                                        66.85      56.58      61.29       1504
+    ፧ (label_id: 15)                                        67.67      83.72      74.84        215
+    -------------------
+    micro avg                                               98.99      98.99      98.99     247186
+    macro avg                                               80.97      83.74      82.02     247186
+    weighted avg                                            98.99      98.99      98.98     247186
+cap test report:
+    label                                                precision    recall       f1           support
+    LOWER (label_id: 0)                                     96.65      99.78      98.19       1360
+    UPPER (label_id: 1)                                     98.90      85.13      91.50        316
+    -------------------
+    micro avg                                               97.02      97.02      97.02       1676
+    macro avg                                               97.77      92.45      94.84       1676
+    weighted avg                                            97.08      97.02      96.93       1676
+seg test report:
+    label                                                precision    recall       f1           support
+    NOSTOP (label_id: 0)                                    99.85      99.74      99.80     239845
+    FULLSTOP (label_id: 1)                                  91.72      95.25      93.45       7341
+    -------------------
+    micro avg                                               99.60      99.60      99.60     247186
+    macro avg                                               95.79      97.49      96.62     247186
+    weighted avg                                            99.61      99.60      99.61     247186
+```
+</details>