ToluClassics commited on
Commit
6c82d3a
1 Parent(s): e7b0b5d

update readme

Browse files
Files changed (1) hide show
  1. README.md +18 -0
README.md CHANGED
@@ -39,6 +39,24 @@ Afaan Oromoo(orm), Amharic(amh), Gahuza(gah), Hausa(hau), Igbo(igb), Nigerian Pi
39
  - 143 Million Tokens (1GB of text data)
40
  - Tokenizer Vocabulary Size: 70,000 tokens
41
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
42
  ## Training Procedure
43
 
44
  For information on training procedures, please refer to the AfriTeVa [paper](#) or [repository](https://github.com/castorini/afriteva)
 
39
  - 143 Million Tokens (1GB of text data)
40
  - Tokenizer Vocabulary Size: 70,000 tokens
41
 
42
+ ## Intended uses & limitations
43
+
44
+ ```python
45
+ >>> from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
46
+
47
+ >>> tokenizer = AutoTokenizer.from_pretrained("castorini/afriteva_large")
48
+ >>> model = AutoModelForSeq2SeqLM.from_pretrained("castorini/afriteva_large")
49
+
50
+ >>> src_text = "Ó hùn ọ́ láti di ara wa bí?"
51
+ >>> tgt_text = "Would you like to be?"
52
+
53
+ >>> model_inputs = tokenizer(src_text, return_tensors="pt")
54
+ >>> with tokenizer.as_target_tokenizer():
55
+ labels = tokenizer(tgt_text, return_tensors="pt").input_ids
56
+
57
+ >>> model(**model_inputs, labels=labels) # forward pass
58
+ ```
59
+
60
  ## Training Procedure
61
 
62
  For information on training procedures, please refer to the AfriTeVa [paper](#) or [repository](https://github.com/castorini/afriteva)