StephanAkkerman commited on
Commit
cf7a025
1 Parent(s): 88e3087

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +50 -1
README.md CHANGED
@@ -4,4 +4,53 @@ language:
4
  - en
5
  tags:
6
  - finance
7
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4
  - en
5
  tags:
6
  - finance
7
+ ---
8
+
9
+
10
+ On
11
+ Home
12
+ Settings
13
+ Close popup
14
+ Institution Logo
15
+ Lean Library
16
+ Lean Library is brought to you by the Utrecht University Library
17
+
18
+ Contact the Utrecht University Library
19
+
20
+ # FinTwitBERT
21
+
22
+ FinTwitBERT is a language model specifically pre-trained on a large dataset of financial tweets. This specialized BERT model aims to capture the unique jargon and communication style found in the financial Twitter sphere, making it an ideal tool for sentiment analysis, trend prediction, and other financial NLP tasks.
23
+
24
+ ## Table of Contents
25
+ - [Dataset](#dataset)
26
+ - [Model Details](#model-details)
27
+ - [Installation](#installation)
28
+ - [Usage](#usage)
29
+ - [Training](#training)
30
+ - [Evaluation](#evaluation)
31
+ - [Contributing](#contributing)
32
+ - [License](#license)
33
+
34
+ ## Dataset
35
+ FinTwitBERT is pre-trained on Taborda et al.'s [Stock Market Tweets Data](https://ieee-dataport.org/open-access/stock-market-tweets-data) consisting of 943,672 tweets, including 1,300 labeled tweets. All labeled tweets are used for evaluation of the pre-trained model, using perplexity as a measurement. The other tweets are used for pre-training with 10% being used for model validation.
36
+
37
+ ## Model details
38
+ We use the [FinBERT](https://huggingface.co/ProsusAI/finbert) model and tokenizer from ProsusAI as our base. We added two masks to the tokenizer: `@USER` for user mentions and `[URL]` for URLs in tweets. The model is then pre-trained for 10 epochs using loss at the metric for the best model. We apply early stopping to prevent overfitting the model.
39
+
40
+ The latest pre-trained model and tokenizer can be found here on huggingface: https://huggingface.co/StephanAkkerman/FinTwitBERT.
41
+
42
+ ## Installation
43
+ ```bash
44
+ # Clone this repository
45
+ git clone https://github.com/TimKoornstra/FinTwitBERT
46
+ # Install required packages
47
+ pip install -r requirements.txt
48
+ ```
49
+ ## Usage
50
+ The model can be finetuned for specific tasks such as sentiment classification. For more information about it, you can visit our other repository: https://github.com/TimKoornstra/stock-sentiment-classifier.
51
+
52
+ ## Contributing
53
+ Contributions are welcome! If you have a feature request, bug report, or proposal for code refactoring, please feel free to open an issue on GitHub. I appreciate your help in improving this project.
54
+
55
+ ## License
56
+ This project is licensed under the GPL-3.0 License. See the [LICENSE](LICENSE) file for details.