AsPOS: Pre-trained model for Assamese POS tagging
AsPOS is a pre-trained POS tagging model focusing on Assamese language. Stacked embedding (MuRIL + FlairEmbedding) and BiLSTM-CRF model are used to train the model. It achieves an F1-score of 74.62% in POS tagging with 41 POS tagset.
Annotated Assamese POS tagged dataset
The dataset has been annotated by an automatic POS tagger, of which the accuracy is 74.62%. After that, it is manually corrected. The dataset is split into three parts for model training, those are train.txt, dev.txt, and test.txt.
Requirements
- It requires python 3.6+
- Install Flair (Version: 0.9.0) preferably in virtual environment,
How to run
Download the pre-trained model from the link- AsPOS.
from flair.models import SequenceTagger
from flair.data import Sentence, Token
# Load the tagger
model = SequenceTagger.load('AsPOS.pt')
# create example sentence
sen='ফুকন বসুমতাৰী এজন অধ্য়াপক । তেওঁ বৰ্তমান কোকৰাঝাৰত থাকে ।'
sentence = Sentence(sen)
# predict tags and print
model.predict(sentence)
print(sentence.to_tagged_string())
ফুকন <N_NNP> বসুমতাৰী <N_NN> এজন <QT_QTF> অধ্য়াপক <N_NN> । <RD_PUNC> তেওঁ <PR_PRP> বৰ্তমান <RB>
কোকৰাঝাৰত <N_NNP> থাকে <V_VM> । <RD_PUNC>
# create example sentence
sen='মাতৃভাষাৰ সমান্তৰালকৈ সংস্কৃত, ইংৰাজী ভাষাৰ চৰ্চা অত্যন্ত জৰুৰী ৷'
sentence = Sentence(sen)
# predict tags and print
model.predict(sentence)
print(sentence.to_tagged_string()
মাতৃভাষাৰ <N_NN> সমান্তৰালকৈ <N_NN> সংস্কৃত <N_NNP> , <RD_PUNC> ইংৰাজী <N_NNP> ভাষাৰ <N_ANN> চৰ্চা <N_NN> অত্যন্ত <RP_INTF>
জৰুৰী <N_NN> ৷ <RD_PUNC>
# If you use our model, please cite this paper:
@INPROCEEDINGS{10017934,
author={Pathak, Dhrubajyoti and Nandi, Sukumar and Sarmah, Priyankoo},
booktitle={2022 IEEE/ACS 19th International Conference on Computer Systems and Applications (AICCSA)},
title={AsPOS: Assamese Part of Speech Tagger using Deep Learning Approach},
year={2022},
volume={},
number={},
pages={1-8},
doi={10.1109/AICCSA56895.2022.10017934}}