FinBERT-PT-BR : Financial BERT PT BR
FinBERT-PT-BR is a pre-trained NLP model to analyze sentiment of Brazilian Portuguese financial texts.
The model was trained in two main stages: language modeling and sentiment modeling. In the first stage, a language model was trained with more than 1.4 million texts of financial news in Portuguese. From this first training, it was possible to build a sentiment classifier with few labeled texts (500) that presented a satisfactory convergence.
At the end of the work, a comparative analysis with other models and the possible applications of the developed model are presented. In the comparative analysis, it was possible to observe that the developed model presented better results than the current models in the state of the art. Among the applications, it was demonstrated that the model can be used to build sentiment indices, investment strategies and macroeconomic data analysis, such as inflation.
Applications
Sentiment Index
Usage
BertForSequenceClassification
from transformers import AutoTokenizer, BertForSequenceClassification
import numpy as np
pred_mapper = {
0: "POSITIVE",
1: "NEGATIVE",
2: "NEUTRAL"
}
tokenizer = AutoTokenizer.from_pretrained("lucas-leme/FinBERT-PT-BR")
finbertptbr = BertForSequenceClassification.from_pretrained("lucas-leme/FinBERT-PT-BR")
tokens = tokenizer(["Hoje a bolsa caiu", "Hoje a bolsa subiu"], return_tensors="pt",
padding=True, truncation=True, max_length=512)
finbertptbr_outputs = finbertptbr(**tokens)
preds = [pred_mapper[np.argmax(pred)] for pred in finbertptbr_outputs.logits.cpu().detach().numpy()]
Pipeline
from transformers import (
AutoTokenizer,
BertForSequenceClassification,
pipeline,
)
finbert_pt_br_tokenizer = AutoTokenizer.from_pretrained("lucas-leme/FinBERT-PT-BR")
finbert_pt_br_model = BertForSequenceClassification.from_pretrained("lucas-leme/FinBERT-PT-BR")
finbert_pt_br_pipeline = pipeline(task='text-classification', model=finbert_pt_br_model, tokenizer=finbert_pt_br_tokenizer)
finbert_pt_br_pipeline(['Hoje a bolsa caiu', 'Hoje a bolsa subiu'])
Author
Citation
@inproceedings{santos2023finbert,
title={FinBERT-PT-BR: An{\'a}lise de Sentimentos de Textos em Portugu{\^e}s do Mercado Financeiro},
author={Santos, Lucas L and Bianchi, Reinaldo AC and Costa, Anna HR},
booktitle={Anais do II Brazilian Workshop on Artificial Intelligence in Finance},
pages={144--155},
year={2023},
organization={SBC}
}
Paper
- Downloads last month
- 393,416