license: apache-2.0
datasets:
- thegoodfellas/mc4-pt-cleaned
language:
- pt
inference: false
metrics:
- bleu
library_name: transformers
pipeline_tag: text2text-generation
Model Card for Model ID
This is the PT-BR Flan-T5-base model. Forked from: https://huggingface.co/thegoodfellas/tgf-flan-t5-base-ptbr
Model Details
Model Description
This model was created to act as the base study for researchs who wants to learn how the Flan-T5 works. This is the Portuguese version.
- Developed by: The Good Fellas team
- Model type: Flan-T5
- Language(s) (NLP): Portuguese (BR)
- License: apache-2.0
- Finetuned from model [optional]: Flan-T5-base
We would like to thanks the TPU Research Cloud team for that amazing opportunity given to us. To learn about TRC: https://sites.research.google/trc/about/
Uses
This model can be used as base to downstream task as instructed by Flan-T5 paper
Bias, Risks, and Limitations
Due to the nature of the web-scraped corpus on which Flan-T5 models were trained, it is likely that their usage could reproduce and amplify pre-existing biases in the data, resulting in potentially harmful content such as racial or gender stereotypes and conspiracist views. For this reason, the study of such biases is explicitly encouraged, and model usage should ideally be restricted to research-oriented and non-user-facing endeavors.
How to Get Started with the Model
Use the code below to get started with the model.
from transformers import FlaxT5ForConditionalGeneration
model_flax = FlaxT5ForConditionalGeneration.from_pretrained("thegoodfellas/tgf-flan-t5-base-ptbr")
Training Details
Training Data
The training was performed from two datasets, BrWac and Oscar (Portuguese section).
Training Procedure
We trained this model by 1 epoch on each dataset.
Training Hyperparameters
Thanks to TPU Research Cloud we were able to train this model on TPU. 1 single TPUv2-8
- Training regime:
- Precision: bf16
- Batch size: 32
- LR: 0,005
- Warmup steps: 10_000
- Epochs: 1 (each dataset)
- Optimizer: Adafactor
Environmental Impact
Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019).
Experiments were conducted using Google Cloud Platform in region us-central1, which has a carbon efficiency of 0.57 kgCO$_2$eq/kWh. A cumulative of 50 hours of computation was performed on hardware of type TPUv2 Chip (TDP of 221W).
Total emissions are estimated to be 6.3 kgCO$_2$eq of which 100 percents were directly offset by the cloud provider.
- Hardware Type: TPUv2
- Hours used: 50
- Cloud Provider: GCP
- Compute Region: us-central1
- Carbon Emitted: 6.3 kgCO$_2$eq
Technical Specifications [optional]
Model Architecture and Objective
Flan-T5