Fine-tuned TinyBERT_General_4L_312D model for claim detection binary classification

The task is formulated as a binary classification of determining if the input text is a claim or not.

This model is a fine-tuned version of the TinyBERT_General_4L_312D. The model was finetuded on a ClaimBuster dataset (http://doi.org/10.5281/zenodo.3609356). For the training we used only -1 and ( merged 1 and 0) labels, corresponding to No and Yes decision on if the text can be considered as a claim.

The model was trained on the full dataset after the evaluation

Usage

from transformers import AutoTokenizer, AutoModelForSequenceClassification

model = AutoModelForSequenceClassification.from_pretrained("yevhenkost/claim-detection-claimbuster-binary-TinyBERT_General_4L_312D")
tokenizer = AutoTokenizer.from_pretrained("yevhenkost/claim-detection-claimbuster-binary-TinyBERT_General_4L_312D")

text_inputs = ["The water is wet"]

model_inputs = tokenizer(text_inputs, return_tensors="pt")

# regular SequenceClassifierOutput
model_output = model(**model_inputs)

# logits location to decision
decoding_dict = {0:"No", 1:"Yes"}

# model_output.logits tensor of shape (BATCH SIZE, 2);

Training Process

Data Preparation

The files were donwloaded from the ClaimBuster url. The dataset was prepared in the following way:

import pandas as pd
from sklearn.model_selection import train_test_split

# read data
gt_df = pd.read_csv("groundtruth.csv")
cs_df = pd.read_csv("crowdsourced.csv")

# concatenate and filter labels
total_df = pd.concat(
    [cs_df, gt_df]
)

total_df['labels'] = total_df["Verdict"].apply(lambda x: 0 if x == -1 else 1)

# split on train and test
train_df, test_df = train_test_split(total_df, test_size=0.2, random_state=2)

Test Result

              precision    recall  f1-score   support

          No       0.90      0.85      0.88      3126
         Yes       0.74      0.82      0.78      1581

    accuracy                           0.84      4707
   macro avg       0.82      0.84      0.83      4707
weighted avg       0.85      0.84      0.84      4707

Downloads last month
26
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.