ClassCat commited on
Commit
63699ef
1 Parent(s): ebeedd7

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +41 -0
README.md ADDED
@@ -0,0 +1,41 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: ca
3
+ license: cc-by-sa-4.0
4
+ datasets:
5
+ - wikipedia
6
+ - cc100
7
+ widget:
8
+ - text: "Jo <mask> japonès."
9
+ - text: "Jo sóc <mask>."
10
+ - text: "Ell està una mica <mask>."
11
+ - text: "És un bon <mask>."
12
+ - text: "M'agradaria menjar una <mask>."
13
+ ---
14
+
15
+ ## RoBERTa Catalan base model (Uncased)
16
+
17
+ ### Prerequisites
18
+
19
+ transformers==4.19.2
20
+
21
+ ### Model architecture
22
+
23
+ This model uses RoBERTa base setttings except vocabulary size.
24
+
25
+ ### Tokenizer
26
+
27
+ Using BPE tokenizer with vocabulary size 50,000.
28
+
29
+ ### Training Data
30
+
31
+ * [wiki40b/ca](https://www.tensorflow.org/datasets/catalog/wiki40b#wiki40bca) (Catalan Wikipedia)
32
+ * Subset of [CC-100/ca](https://data.statmt.org/cc-100/) : Monolingual Datasets from Web Crawl Data
33
+
34
+ ### Usage
35
+
36
+ ```python
37
+ from transformers import pipeline
38
+
39
+ unmasker = pipeline('fill-mask', model='ClassCat/roberta-base-catalan')
40
+ unmasker("Jo <mask> japonès.")
41
+ ```