IndicIRSuite: Multilingual Dataset and Neural Information Models for Indian Languages
Paper link: https://arxiv.org/abs/2312.09508
Dataset link: https://huggingface.co/datasets/saifulhaq9/indicmarco
Model link: https://huggingface.co/saifulhaq9/indiccolbert
Contributors & Acknowledgements
Key Contributors and Team Members: Saiful Haq, Ashutosh Sharma, Pushpak Bhattacharyya
Kindly cite our paper, If you are are using our datasets or models:
@article{haq2023indicirsuite, title={IndicIRSuite: Multilingual Dataset and Neural Information Models for Indian Languages}, author={Haq, Saiful and Sharma, Ashutosh and Bhattacharyya, Pushpak}, journal={arXiv preprint arXiv:2312.09508}, year={2023} }
About
This repository contains Multilingual ColBERT models in 11 Indian Languages.
Language Code to Language Mapping
asm_Beng: Assamese Language
ben_Beng: Bengali Language
guj_Gujr: Gujarati Language
hin_Deva: Hindi Language
kan_Knda: Kannada Language
mal_Mlym: Malyalam Language
mar_Deva: Marathi Language
ory_Orya: Oriya Language
pan_Guru: Punjabi Language
tam_Taml: Tamil Language
tel_Telu: Telugu Language