ibaucells commited on
Commit
fc9f6b0
1 Parent(s): 228baf5

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +120 -0
README.md CHANGED
@@ -1,3 +1,123 @@
1
  ---
 
 
 
 
 
2
  license: apache-2.0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+
3
+ language:
4
+
5
+ - ca
6
+
7
  license: apache-2.0
8
+
9
+ tags:
10
+
11
+ - "catalan"
12
+
13
+ - "text classification"
14
+
15
+ - "WikiCAT_ca"
16
+
17
+ - "CaText"
18
+
19
+ - "Catalan Textual Corpus"
20
+
21
+ datasets:
22
+
23
+ - "projecte-aina/WikiCAT_ca"
24
+
25
+ metrics:
26
+
27
+ - f1
28
+
29
+ model-index:
30
+ - name: roberta-base-ca-v2-cased-wikicat-ca
31
+ results:
32
+ - task:
33
+ type: text-classification
34
+ dataset:
35
+ type: projecte-aina/WikiCAT_ca
36
+ name: WikiCAT_ca
37
+ metrics:
38
+ - name: F1
39
+ type: f1
40
+ value: 77.82
41
+
42
+ widget:
43
+
44
+ - text: "La ressonància magnètica és una prova diagnòstica clau per a moltes malalties."
45
+
46
  ---
47
+
48
+ # Catalan BERTa-v2 (roberta-base-ca-v2) finetuned for Text Classification.
49
+
50
+ ## Table of Contents
51
+ - [Model Description](#model-description)
52
+ - [Intended Uses and Limitations](#intended-uses-and-limitations)
53
+ - [How to Use](#how-to-use)
54
+ - [Training](#training)
55
+ - [Training Data](#training-data)
56
+ - [Training Procedure](#training-procedure)
57
+ - [Evaluation](#evaluation)
58
+ - [Variable and Metrics](#variable-and-metrics)
59
+ - [Evaluation Results](#evaluation-results)
60
+ - [Licensing Information](#licensing-information)
61
+ - [Citation Information](#citation-information)
62
+ - [Funding](#funding)
63
+ - [Contributions](#contributions)
64
+
65
+ ## Model description
66
+
67
+ The **roberta-base-ca-v2-cased-wikicat-ca** is a Text Classification model for the Catalan language fine-tuned from the [roberta-base-ca-v2](https://huggingface.co/projecte-aina/roberta-base-ca-v2) model, a [RoBERTa](https://arxiv.org/abs/1907.11692) base model pre-trained on a medium-size corpus collected from publicly available corpora and crawlers (check the roberta-base-ca-v2 model card for more details).
68
+
69
+ ## Intended Uses and Limitations
70
+
71
+ **roberta-base-ca-v2-cased-wikicat-ca** model can be used to classify texts. The model is limited by its training dataset and may not generalize well for all use cases.
72
+
73
+ ## How to Use
74
+
75
+ Here is how to use this model:
76
+
77
+ ```python
78
+ from transformers import pipeline
79
+ from pprint import pprint
80
+
81
+ nlp = pipeline("text-classification", model="roberta-base-ca-v2-cased-wikicat-ca")
82
+ example = "La ressonància magnètica és una prova diagnòstica clau per a moltes malalties."
83
+
84
+ tc_results = nlp(example)
85
+ pprint(tc_results)
86
+ ```
87
+
88
+ ## Training
89
+
90
+ ### Training data
91
+ We used the TC dataset in Catalan called [WikiCAT_ca](https://huggingface.co/datasets/projecte-aina/WikiCAT_ca) for training and evaluation.
92
+
93
+ ### Training Procedure
94
+ The model was trained with a batch size of 4 and three learning rates (1e-5, 3e-5, 5e-5) for 10 epochs. We then selected the best learning rate (3e-5) and checkpoint (epoch 3, step 1857) using the downstream task metric in the corresponding development set.
95
+
96
+ ## Evaluation
97
+
98
+ ### Variable and Metrics
99
+
100
+ This model was finetuned maximizing F1 (weighted) score.
101
+
102
+ ### Evaluation results
103
+ We evaluated the _roberta-base-ca-v2-cased-wikicat-ca_ on the WikiCAT_ca dev set:
104
+
105
+ | Model | WikiCAT_ca (F1)|
106
+ | ------------|:-------------|
107
+ | roberta-base-ca-v2-cased-wikicat-ca | 77.82 |
108
+
109
+ For more details, check the fine-tuning and evaluation scripts in the official [GitHub repository](https://github.com/projecte-aina/club).
110
+
111
+ ## Licensing Information
112
+
113
+ [Apache License, Version 2.0](https://www.apache.org/licenses/LICENSE-2.0)
114
+
115
+ ## Citation Information
116
+
117
+
118
+ ### Funding
119
+ This work was funded by the [Departament de la Vicepresidència i de Polítiques Digitals i Territori de la Generalitat de Catalunya](https://politiquesdigitals.gencat.cat/ca/inici/index.html#googtrans(ca|en) within the framework of [Projecte AINA](https://politiquesdigitals.gencat.cat/ca/economia/catalonia-ai/aina).
120
+
121
+ ## Contributions
122
+
123
+ [N/A]