projecte-aina
/

roberta-base-ca-v2-cased-wikicat-ca

 ---
+language:
+- ca
 license: apache-2.0
+tags:
+- "catalan"
+- "text classification"
+- "WikiCAT_ca"
+- "CaText"
+- "Catalan Textual Corpus"
+datasets:
+- "projecte-aina/WikiCAT_ca"
+metrics:
+- f1
+model-index:
+- name: roberta-base-ca-v2-cased-wikicat-ca
+  results:
+  - task:
+      type: text-classification
+    dataset:
+      type: projecte-aina/WikiCAT_ca
+      name: WikiCAT_ca
+    metrics:
+      - name: F1
+        type: f1
+        value: 77.82
+widget:
+- text: "La ressonància magnètica és una prova diagnòstica clau per a moltes malalties."
 ---
+# Catalan BERTa-v2 (roberta-base-ca-v2) finetuned for Text Classification.
+## Table of Contents
+- [Model Description](#model-description)
+- [Intended Uses and Limitations](#intended-uses-and-limitations)
+- [How to Use](#how-to-use)
+- [Training](#training)
+  - [Training Data](#training-data)
+  - [Training Procedure](#training-procedure)
+- [Evaluation](#evaluation)
+   - [Variable and Metrics](#variable-and-metrics)
+   - [Evaluation Results](#evaluation-results)
+- [Licensing Information](#licensing-information)
+- [Citation Information](#citation-information)
+- [Funding](#funding)
+- [Contributions](#contributions)
+## Model description
+The **roberta-base-ca-v2-cased-wikicat-ca** is a Text Classification model for the Catalan language fine-tuned from the [roberta-base-ca-v2](https://huggingface.co/projecte-aina/roberta-base-ca-v2) model, a [RoBERTa](https://arxiv.org/abs/1907.11692) base model pre-trained on a medium-size corpus collected from publicly available corpora and crawlers (check the roberta-base-ca-v2 model card for more details).
+## Intended Uses and Limitations
+**roberta-base-ca-v2-cased-wikicat-ca** model can be used to classify texts. The model is limited by its training dataset and may not generalize well for all use cases.
+## How to Use
+Here is how to use this model:
+```python
+from transformers import pipeline
+from pprint import pprint
+nlp = pipeline("text-classification", model="roberta-base-ca-v2-cased-wikicat-ca")
+example = "La ressonància magnètica és una prova diagnòstica clau per a moltes malalties."
+tc_results = nlp(example)
+pprint(tc_results)
+```
+## Training
+### Training data
+We used the TC dataset in Catalan called [WikiCAT_ca](https://huggingface.co/datasets/projecte-aina/WikiCAT_ca) for training and evaluation.
+### Training Procedure
+The model was trained with a batch size of 4 and three learning rates (1e-5, 3e-5, 5e-5) for 10 epochs. We then selected the best learning rate (3e-5) and checkpoint (epoch 3, step 1857) using the downstream task metric in the corresponding development set.
+## Evaluation
+### Variable and Metrics
+This model was finetuned maximizing F1 (weighted) score.
+### Evaluation results
+We evaluated the _roberta-base-ca-v2-cased-wikicat-ca_ on the WikiCAT_ca dev set:
+| Model        | WikiCAT_ca (F1)|
+| ------------|:-------------|
+| roberta-base-ca-v2-cased-wikicat-ca | 77.82 |
+For more details, check the fine-tuning and evaluation scripts in the official [GitHub repository](https://github.com/projecte-aina/club).
+## Licensing Information
+[Apache License, Version 2.0](https://www.apache.org/licenses/LICENSE-2.0)
+## Citation Information
+### Funding
+This work was funded by the [Departament de la Vicepresidència i de Polítiques Digitals i Territori de la Generalitat de Catalunya](https://politiquesdigitals.gencat.cat/ca/inici/index.html#googtrans(ca|en) within the framework of [Projecte AINA](https://politiquesdigitals.gencat.cat/ca/economia/catalonia-ai/aina).
+## Contributions
+[N/A]