TOSRobertaV2: Terms of Service Fairness Classifier
Model Description
TOSRobertaV2 is a fine-tuned RoBERTa-large model designed to classify clauses in Terms of Service (ToS) documents based on their fairness level. The model categorizes clauses into three classes: clearly fair, potentially unfair, and clearly unfair.
Intended Use
This model is intended for:
- Analyzing Terms of Service documents for potential unfair clauses
- Assisting legal professionals in reviewing contracts
- Helping consumers understand the fairness of agreements they're entering into
- Supporting researchers studying fairness in legal documents
Training Data
The model was trained on the CodeHima/TOS_DatasetV3, which contains labeled clauses from various Terms of Service documents.
Training Procedure
- Base model: RoBERTa-large
- Training type: Fine-tuning
- Number of epochs: 5
- Optimizer: AdamW
- Learning rate: 2e-5
- Batch size: 8
- Weight decay: 0.01
- Training loss: 0.3851972973652529
Evaluation Results
Validation Set Performance
- Accuracy: 0.86
- F1 Score: 0.8588
- Precision: 0.8598
- Recall: 0.8600
Test Set Performance
- Accuracy: 0.8651
Training Progress
Epoch | Training Loss | Validation Loss | Accuracy | F1 | Precision | Recall |
---|---|---|---|---|---|---|
1 | 0.5391 | 0.493973 | 0.798095 | 0.7997 | 0.8056 | 0.79810 |
2 | 0.4621 | 0.489970 | 0.831429 | 0.8320 | 0.8330 | 0.83143 |
3 | 0.3954 | 0.674849 | 0.821905 | 0.8250 | 0.8349 | 0.82191 |
4 | 0.3783 | 0.717495 | 0.860000 | 0.8588 | 0.8598 | 0.86000 |
5 | 0.1542 | 0.881050 | 0.847619 | 0.8490 | 0.8514 | 0.84762 |
Limitations
- The model's performance may vary on ToS documents from domains or industries not well-represented in the training data.
- It may struggle with highly complex or ambiguous clauses.
- The model's understanding of "fairness" is based on the training data and may not capture all nuances of legal fairness.
Ethical Considerations
- This model should not be used as a substitute for professional legal advice.
- There may be biases present in the training data that could influence the model's judgments.
- Users should be aware that the concept of "fairness" in legal documents can be subjective and context-dependent.
How to Use
You can use this model directly with the Hugging Face transformers
library:
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
tokenizer = AutoTokenizer.from_pretrained("CodeHima/TOSRobertaV2")
model = AutoModelForSequenceClassification.from_pretrained("CodeHima/TOSRobertaV2")
text = "Your clause here"
inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True, max_length=512)
with torch.no_grad():
logits = model(**inputs).logits
probabilities = torch.softmax(logits, dim=1)
predicted_class = torch.argmax(probabilities, dim=1).item()
classes = ['clearly fair', 'potentially unfair', 'clearly unfair']
print(f"Predicted class: {classes[predicted_class]}")
print(f"Probabilities: {probabilities[0].tolist()}")
Citation
If you use this model in your research, please cite:
@misc{TOSRobertaV2,
author = {CodeHima},
title = {TOSRobertaV2: Terms of Service Fairness Classifier},
year = {2024},
publisher = {Hugging Face},
journal = {Hugging Face Model Hub},
howpublished = {\url{https://huggingface.co/CodeHima/TOSRobertaV2}}
}
License
This model is released under the MIT license.
- Downloads last month
- 8
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.