ljvmiranda921/tl_calamancy_trf
Token Classification
•
Updated
•
18
•
5
Model collection for https://github.com/ljvmiranda921/calamanCy. You can find more information in each model (or dataset) card.
Note Transformer-based pipeline using RoBERTa-Tagalog
Note Large-sized pipeline based on fastText (714k unique vectors, 300 dimensions, Size: 455 MB)
Note Medium-sized pipeline based on floret (50k unique vectors, 200 dimensions, Size: 77 MB)
Note Gold-standard Tagalog NER dataset. Cohen's kappa = 0.81