arxiv:2408.17024

InkubaLM: A small language model for low-resource African languages

Published on Aug 30

· Submitted by

Atnafu on Sep 2

Upvote

Authors:

Atnafu Lambebo Tonja ,

Bonaventure F. P. Dossou ,

Jessica Ojo ,

Jenalea Rajab ,

Eric Peter Wairagala ,

Aremu Anuoluwapo ,

Pelonomi Moiloa ,

Jade Abbott ,

Vukosi Marivate ,

Abstract

High-resource language models often fall short in the African context, where there is a critical need for models that are efficient, accessible, and locally relevant, even amidst significant computing and data constraints. This paper introduces InkubaLM, a small language model with 0.4 billion parameters, which achieves performance comparable to models with significantly larger parameter counts and more extensive training data on tasks such as machine translation, question-answering, AfriMMLU, and the AfriXnli task. Notably, InkubaLM outperforms many larger models in sentiment analysis and demonstrates remarkable consistency across multiple languages. This work represents a pivotal advancement in challenging the conventional paradigm that effective language models must rely on substantial resources. Our model and datasets are publicly available \url{https://huggingface.co/lelapa} to encourage research and development on low-resource languages.

View arXiv page View PDF Add to collection

Community

Atnafu

Paper author Paper submitter Sep 2

This comment has been hidden

librarian-bot

Sep 3

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

InkubaLM: A small language model for low-resource African languages

Abstract

Community

Models citing this paper 3

Datasets citing this paper 2

Spaces citing this paper 1

Collections including this paper 3