---
language: en
license: apache-2.0
datasets:
- claimbuster
metrics:
- f1
- accuracy
- precision
- recall
pipeline_tag: text-classification
---


# Fine-tuned bert-base-cased model for claims checkworthiness binary classification 

The task is formulated as a binary classification task of determining if the claim (text) is worth fact-checking.
<br>

This model is a fine-tuned version of the [BERT base cased model](https://huggingface.co/bert-base-cased). The model was finetuded on a [ClaimBuster dataset](https://zenodo.org/records/3609356) (http://doi.org/10.5281/zenodo.3609356).
For the training we used only 0 and 1 labels, corresponding to Yes and No decision on whether the claim is considered to be chack-worthy.

<b> The model was trained on the full dataset after the evaluation</b>

# Usage
## BertForSequenceClassification
```python
from transformers import BertTokenizer,BertForSequenceClassification

model = BertForSequenceClassification.from_pretrained("yevhenkost/claimbuster-yesno-binary-bert-base-cased")
tokenizer = BertTokenizer.from_pretrained("yevhenkost/claimbuster-yesno-binary-bert-base-cased")

text_inputs = ["The water is wet"]

model_inputs = tokenizer(text_inputs, return_tensors="pt")

# regular SequenceClassifierOutput
model_output = model(**model_inputs)

# model_output.logits tensor([[-0.2657,  0.0749]])
```

## Pipeline
```python
```

# Training Process
## Data Preparation
The files were donwloaded from the ClaimBuster url. The dataset was prepared in the following way:

```python
import pandas as pd
from sklearn.model_selection import train_test_split

# read data
gt_df = pd.read_csv("groundtruth.csv")
cs_df = pd.read_csv("crowdsourced.csv")

# concatenate and filter labels
total_df = pd.concat(
    [cs_df, gt_df]
)
total_df = total_df[total_df["Verdict"].isin([0,1])]

# split on train and test
train_df, test_df = train_test_split(total_df, test_size=0.2, random_state=2)
```

## Test Result

```python
                  precision    recall  f1-score   support

          No       0.74      0.57      0.65       485
         Yes       0.83      0.91      0.87      1139

    accuracy                           0.81      1624
   macro avg       0.79      0.74      0.76      1624
weighted avg       0.81      0.81      0.81      1624


```