AsPOS: Pre-trained model for Assamese POS tagging

AsPOS is a pre-trained POS tagging model focusing on Assamese language. Stacked embedding (MuRIL + FlairEmbedding) and BiLSTM-CRF model are used to train the model. It achieves an F1-score of 74.62% in POS tagging with 41 POS tagset.

Annotated Assamese POS tagged dataset

The dataset has been annotated by an automatic POS tagger, of which the accuracy is 74.62%. After that, it is manually corrected. The dataset is split into three parts for model training, those are train.txt, dev.txt, and test.txt.

Requirements

It requires python 3.6+
Install Flair (Version: 0.9.0) preferably in virtual environment,

How to run

Download the pre-trained model from the link- AsPOS.

from flair.models import SequenceTagger
from flair.data import  Sentence, Token

# Load the tagger

model = SequenceTagger.load('AsPOS.pt')

#  create example sentence
sen='ফুকন বসুমতাৰী এজন অধ্য়াপক । তেওঁ বৰ্তমান কোকৰাঝাৰত থাকে ।'
sentence = Sentence(sen)
# predict tags and print
model.predict(sentence)
print(sentence.to_tagged_string())
ফুকন <N_NNP> বসুমতাৰী <N_NN> এজন <QT_QTF> অধ্য়াপক <N_NN> । <RD_PUNC> তেওঁ <PR_PRP> বৰ্তমান <RB> 
কোকৰাঝাৰত <N_NNP> থাকে <V_VM> । <RD_PUNC>

#  create example sentence
sen='মাতৃভাষাৰ সমান্তৰালকৈ সংস্কৃত, ইংৰাজী ভাষাৰ চৰ্চা অত্যন্ত জৰুৰী ৷'
sentence = Sentence(sen)
# predict tags and print
model.predict(sentence)
print(sentence.to_tagged_string()
মাতৃভাষাৰ <N_NN> সমান্তৰালকৈ <N_NN> সংস্কৃত <N_NNP> , <RD_PUNC> ইংৰাজী <N_NNP> ভাষাৰ <N_ANN> চৰ্চা <N_NN> অত্যন্ত <RP_INTF> 
জৰুৰী <N_NN> ৷ <RD_PUNC>

# If you use our model, please cite this paper:

@INPROCEEDINGS{10017934,
  author={Pathak, Dhrubajyoti and Nandi, Sukumar and Sarmah, Priyankoo},
  booktitle={2022 IEEE/ACS 19th International Conference on Computer Systems and Applications (AICCSA)}, 
  title={AsPOS: Assamese Part of Speech Tagger using Deep Learning Approach}, 
  year={2022},
  volume={},
  number={},
  pages={1-8},
  doi={10.1109/AICCSA56895.2022.10017934}}