Edit model card

BertTOS v2: Terms of Service Unfairness Classifier

Model Details

  • Model Name: BertTOS v2
  • Model Type: Fine-tuned BERT for sequence classification
  • Version: 2.0
  • Language(s): English
  • License: [MIT]
  • Developer: [Himanshu Mohanty]

Model Description

BertTOS v2 is a fine-tuned BERT model designed to classify clauses in Terms of Service (ToS) documents based on their unfairness level. This model can help users identify potentially problematic clauses in legal documents, particularly in the context of consumer protection.

Task

The model performs multi-class classification on individual sentences or clauses, categorizing them into three levels of unfairness:

  1. Clearly Fair
  2. Potentially Unfair
  3. Clearly Unfair

Training Data

The model was trained on the CodeHima/TOS_Dataset dataset, which contains annotated sentences from Terms of Service documents. Each sentence is labeled with one of the three unfairness levels.

Model Architecture

  • Base Model: BERT (bert-base-uncased)
  • Fine-tuning: Sequence classification head
  • Input: Tokenized text (max length 512 tokens)
  • Output: Probabilities for each unfairness level

Performance

The model's performance metrics on the test set:

  • Accuracy: [0.8795761078998073]
  • F1 Score (weighted): [0.885282]
  • Precision (weighted): [0.883729]
  • Recall (weighted): [0.889157]

Limitations

  • The model is trained on English language ToS documents and may not perform well on other languages or legal contexts.
  • Performance may vary depending on the specific wording and context of clauses.
  • The model should be used as a tool to assist human judgment, not as a definitive legal assessment.

Ethical Considerations

  • This model is intended to help identify potentially unfair clauses, but it should not be considered as legal advice.
  • Users should be aware of potential biases in the training data and model predictions.
  • The model's output should be reviewed by legal professionals for critical applications.

How to Use

You can use this model directly with the Hugging Face transformers library:

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

# Load model and tokenizer
model_name = "YourHuggingFaceUsername/TOSBertV2"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

# Function to predict unfairness level
def predict_unfairness(text):
    inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True, max_length=512)
    
    model.eval()
    with torch.no_grad():
        outputs = model(**inputs)
    
    probabilities = torch.softmax(outputs.logits, dim=-1).squeeze()
    predicted_class = torch.argmax(probabilities).item()
    
    label_mapping = {0: 'clearly_fair', 1: 'potentially_unfair', 2: 'clearly_unfair'}
    predicted_label = label_mapping[predicted_class]
    
    return predicted_label, probabilities.tolist()

# Example usage
clause = "The company reserves the right to change these terms at any time without notice."
predicted_label, probabilities = predict_unfairness(clause)

print(f"Predicted unfairness level: {predicted_label}")
print("Probabilities:")
for label, prob in zip(['clearly_fair', 'potentially_unfair', 'clearly_unfair'], probabilities):
    print(f"{label}: {prob:.4f}")

Training

The model was trained using the following hyperparameters:

  • Epochs: 3
  • Batch Size: 16
  • Learning Rate: [ ]
  • Optimizer: AdamW
  • Weight Decay: 0.01

Citation

If you use this model in your research, please cite:

@misc{TOSBertV2,
  author = {Himanshu Mohanty},
  title = {TOSBertV2: is a fine-tuned BERT model designed to classify clauses in Terms of Service},
  year = {2024},
  publisher = {Hugging Face},
  journal = {Hugging Face Model Hub},
  howpublished = {\url{https://huggingface.co/CodeHima/TOSBertV2}}
}
Downloads last month
9
Safetensors
Model size
109M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.