metadata

license: mit
language:
  - en
metrics:
  - accuracy
  - mse
  - f1
base_model:
  - dmis-lab/biobert-base-cased-v1.2
  - google-bert/bert-base-cased
pipeline_tag: text-classification
model-index:
  - name: bert-causation-rating-dr2
    results:
      - task:
          type: text-classification
        dataset:
          name: rating_dr2
          type: dataset
        metrics:
          - name: off by 1 accuracy
            type: accuracy
            value: 74.78991596638656
          - name: mean squared error for ordinal data
            type: mse
            value: 0.773109243697479
          - name: weighted F1 score
            type: f1
            value: 0.76386248572931
          - name: Kendall's tau coefficient
            type: Kendall's tau
            value: 0.8081294201575603
        source:
          name: Keling Wang
          url: https://github.com/Keling-Wang
datasets:
  - kelingwang/causation_strength_rating

Model description

This bert-causation-rating-dr2 model is a fine-tuned biobert-base-cased-v1.2 model on a small set of manually annotated texts with causation labels. This model is tasked with classifying a sentence into different levels of strength of causation expressed in this sentence. Before tuning on this dataset, the biobert-base-cased-v1.2 model is fine-tuned on a dataset containing causation labels from a published paper. This model starts from pre-trained kelingwang/bert-causation-rating-pubmed. For more information please view the link and my GitHub page. The sentences in the dataset were rated independently by two researchers. This dr2 version is tuned on the set of sentences with labels rated by Rater 2 and 3.

Intended use and limitations

This model is primarily used to rate for the strength of expressed causation in a sentence extracted from a clinical guideline in the field of diabetes mellitus management. This model predicts strength of causation (SoC) labels based on the text inputs as:

-1: No correlation or variable relationships mentioned in the sentence.
0: There is correlational relationships but not causation in the sentence.
1: The sentence expresses weak causation.
2: The sentence expresses moderate causation.
3: The sentence expresses strong causation. NOTE: The model output is five one-hot logits and will be 0-index based, and the labels will be 0 to 4. It is good to use this python module if one wants to make predictions.

Performance and hyperparameters

Test metrics

This model achieves the following results on the test dataset. The test dataset is a 25% held-out stratified split of the entire dataset with SEED=114514.

Loss: 18.2347
Off-by-1 accuracy: 74.7899
Off-by-2 accuracy: 91.5966
MSE for ordinal data: 0.7731
Weighted F1: 0.7639
Kendall's Tau: 0.8081

Hyperparameter tuning metrics

This model achieves the following averaged results during 4-fold cross-validation with best hyperparameters in hyperparameter tuning process:

Loss: 0.519251
Off-by-1 accuracy: 98.3803
Off-by-2 accuracy: 99.8944
MSE for ordinal data: 0.02359
Weighted F1: 0.9837
Kendall's Tau: 0.9901

This performance is achieved with the following hyperparameters:

Learning rate: 7.96862e-05
Weight decay: 0.148775
Warmup ratio: 0.460611
Power of polynomial learning rate scheduler: 1.129829
Power to the distance measure used in the loss function \alpha: 3.0

Training settings

The following training configurations apply:

Pre-trained model: kelingwang/bert-causation-rating-pubmed
seed: 114514
batch_size: 128
epoch: 8
max_length in torch.utils.data.Dataset: 128
Loss function: the OLL loss with a tunable hyperparameter \alpha (Power to the distance measure used in the loss function).
lr: 7.96862e-05
weight_decay: 0.148775
warmup_ratio: 0.460611
lr_scheduler_type: polynomial
lr_scheduler_kwargs: {"power": 1.129829, "lr_end": 1e-8}
Power to the distance measure used in the loss function \alpha: 3.0

Framework versions and devices

This model is run on a NVIDIA P100 CPU provided by Kaggle. Framework versions are:

python==3.10.14
cuda==12.4
NVIDIA-SMI==550.90.07
torch=2.4.0
transformers==4.45.1
scikit-learn==1.2.2
optuna==4.0.0
nlpaug==1.1.11