metadata

license: mit
language:
  - en
tags:
  - NLP
  - BERT
  - FinBERT
  - sentiment
  - finance
  - financial-analysis
  - sentiment-analysis
  - financial-sentiment-analysis
  - twitter
  - tweets
  - tweet-analysis
  - stocks
  - stock-market
  - crypto
  - cryptocurrency
datasets:
  - StephanAkkerman/stock-market-tweets-data
  - StephanAkkerman/financial-tweets
  - StephanAkkerman/crypto-stock-tweets
metrics:
  - perplexity
widget:
  - text: Paris is the [MASK] of France.
    example_title: Generic 1
  - text: The goal of life is [MASK].
    example_title: Generic 2
  - text: AAPL is a [MASK] sector stock.
    example_title: AAPL
  - text: I predict that this stock will go [MASK].
    example_title: Stock Direction
  - text: $AAPL is the ticker for the company named [MASK].
    example_title: Ticker
base_model: yiyanghkust/finbert-pretrain
model-index:
  - name: FinTwitBERT
    results:
      - task:
          type: financial-tweet-prediction
          name: Financial Tweet Prediction
        dataset:
          name: Stock Market Tweets Data
          type: finance
        metrics:
          - type: Perplexity
            value: 5.022

FinTwitBERT

FinTwitBERT is a language model specifically pre-trained on a large dataset of financial tweets. This specialized BERT model aims to capture the unique jargon and communication style found in the financial Twitter sphere, making it an ideal tool for sentiment analysis, trend prediction, and other financial NLP tasks.

Sentiment Analysis

The FinTwitBERT-sentiment model leverages FinTwitBERT for the sentiment analysis of financial tweets, offering nuanced insights into the prevailing market sentiments.

Dataset

FinTwitBERT is pre-trained on several financial tweets datasets, consisting of tweets mentioning stocks and cryptocurrencies:

StephanAkkerman/crypto-stock-tweets: 8,024,269 tweets
StephanAkkerman/stock-market-tweets-data: 923,673 tweets
StephanAkkerman/financial-tweets: 263,119 tweets

Model Details

Based on the FinBERT model and tokenizer, FinTwitBERT includes additional masks (@USER and [URL]) to handle common elements in tweets. The model underwent 10 epochs of pre-training, with early stopping to prevent overfitting.

More Information

For a comprehensive overview, including the complete training setup details and more, visit the FinTwitBERT GitHub repository.

Usage

Using HuggingFace's transformers library the model and tokenizers can be converted into a pipeline for masked language modeling.

from transformers import pipeline

pipe = pipeline(
    "fill-mask",
    model="StephanAkkerman/FinTwitBERT",
    tokenizer="StephanAkkerman/FinTwitBERT",
)
print(pipe("Bitcoin is a [MASK] coin."))

License

This project is licensed under the MIT License. See the LICENSE file for details.