sbmaruf commited on
Commit
899ac4b
1 Parent(s): ce1ba25

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +22 -0
README.md CHANGED
@@ -6,6 +6,28 @@
6
 
7
  The model is trained on around ~11B tokens (64 size batch, 512 tokens, 350k steps).
8
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9
  ## Proposal
10
  - [Project Proposal](https://discuss.huggingface.co/t/pretrain-t5-from-scratch-in-bengali/7121)
11
 
 
6
 
7
  The model is trained on around ~11B tokens (64 size batch, 512 tokens, 350k steps).
8
 
9
+ ## load tokenizer
10
+
11
+ ```
12
+ >>> tokenizer = transformers.AutoTokenizer.from_pretrained("flax-community/bengali-t5-large")
13
+ >>> tokenizer.encode("আমি বাংলার গান গাই")
14
+ >>> tokenizer.decode([93, 1912, 814, 5995, 3, 1])
15
+ ```
16
+
17
+ ```
18
+ [93, 1912, 814, 5995, 3, 1]
19
+ 'আমি বাংলার গান গাই </s>'
20
+ ```
21
+
22
+ ## load model
23
+
24
+ ```
25
+ config = T5Config.from_pretrained("flax-community/bengali-t5-base")
26
+ model = FlaxT5ForConditionalGeneration.from_pretrained("flax-community/bengali-t5-base", config=config)
27
+ ```
28
+
29
+ Please note that we haven't finetuned the model in any downstream task. If you are finetuning the model in any downstream task, please let us know about it. Shoot us an email (sbmaruf at gmail dot com)
30
+
31
  ## Proposal
32
  - [Project Proposal](https://discuss.huggingface.co/t/pretrain-t5-from-scratch-in-bengali/7121)
33