Improving the model predictions
I have tried a couple sentences and the model seems not to give accurate predictions most of the time.
What dataset it was trained on, I mean, how was the data labelled? Did you train this model on VADER-generated labels?
I can help you improve your model. Please respond to this discussion.
Hello there,
Thanks for taking the interest in this model. As you can read in the model card, the classification head has been trained on a balanced sample of around 2M StockTwits posts.
The stocktwits posts are labelled by their authors as either Bullish or bearish (or, if there's no label, we assume it is neutral). This is thus exogenous labelling.
The model has been tested, out-of-sample, on a set of around 200K stocktwits posts, and delivered superior classification performance when compared to Vader or other BERT-based classifiers. The accuracy and F1-score for 3-class problem (bearish, neutral and bullish) was around 70% if I remember correctly. This greatly outperformed VADER's predictions, which delivered accuracy closer to 50%.
So cryptobert is the most accurate model for the task at hand (sentiment analysis of cryptocurrency-related social media posts), at least it was in July 2022 when I tested it. However, this task is very cumbersome, as people can label their posts however they see fit.
Remember that the cryptocurrency investor's language is not standardised and can be imprecise in many cases.
The above means that in some cases the predictions could be slightly off. However, this model is still the most accurate classifier for the task at hand.
Using Vader-generated labels, like you suggest, would likely lead to inferior performance, as CryptoBERT severely outperforms vader for this particular task. I've also spent many months trying to improve the model in an affordable way.
So if you have any suggestions on how to improve its performance, I'll be happy to read them, however bare in mind that this task is very difficult by default, and reaching accuracies above 75% is simply not possible for the data available. (As e.g. two people could write a very similar post, but if each of them gives a different label, then it's impossible to predict both instances correctly).
Oh, and please don't try to use it for other tasks, such as testing restaurant reviews, as this model can only be used for cryprocurrency-related language from social media posts