Migrate model card from transformers-repo

Read announcement at https://discuss.huggingface.co/t/announcement-all-model-cards-will-be-migrated-to-hf-co-model-repos/2755
Original file history: https://github.com/huggingface/transformers/commits/master/model_cards/MoseliMotsoehli/zuBERTa/README.md

Files changed (1) hide show

README.md +56 -0

README.md ADDED Viewed

	@@ -0,0 +1,56 @@

+---
+language: zu
+---
+# zuBERTa
+zuBERTa is a RoBERTa style transformer language model trained on zulu text.
+## Intended uses & limitations
+The model can be used for getting embeddings to use on a down-stream task such as question answering.
+#### How to use
+```python
+>>> from transformers import pipeline
+>>> from transformers import AutoTokenizer, AutoModelWithLMHead
+>>> tokenizer = AutoTokenizer.from_pretrained("MoseliMotsoehli/zuBERTa")
+>>> model = AutoModelWithLMHead.from_pretrained("MoseliMotsoehli/zuBERTa")
+>>> unmasker = pipeline('fill-mask', model=model, tokenizer=tokenizer)
+>>> unmasker("Abafika eNkandla bafika sebeholwa <mask> uMpongo kaZingelwayo.")
+[
+  {
+    "sequence": "<s>Abafika eNkandla bafika sebeholwa khona uMpongo kaZingelwayo.</s>",
+    "score": 0.050459690392017365,
+    "token": 555,
+    "token_str": "Ġkhona"
+  },
+  {
+    "sequence": "<s>Abafika eNkandla bafika sebeholwa inkosi uMpongo kaZingelwayo.</s>",
+    "score": 0.03668094798922539,
+    "token": 2321,
+    "token_str": "Ġinkosi"
+  },
+  {
+    "sequence": "<s>Abafika eNkandla bafika sebeholwa ubukhosi uMpongo kaZingelwayo.</s>",
+    "score": 0.028774697333574295,
+    "token": 5101,
+    "token_str": "Ġubukhosi"
+  }
+]
+```
+## Training data
+1. 30k sentences of text, came from the [Leipzig Corpora Collection](https://wortschatz.uni-leipzig.de/en/download) of zulu 2018. These were collected from news articles and creative writtings.
+2. ~7500 articles of human generated translations were scraped from the zulu [wikipedia](https://zu.wikipedia.org/wiki/Special:AllPages).
+### BibTeX entry and citation info
+```bibtex
+@inproceedings{author = {Moseli Motsoehli},
+  title = {Towards transformation of Southern African language models through transformers.},
+  year={2020}
+}
+```