XLM-RoBERTa (base) fine-tuned on HC3 for ChatGPT text detection
XLM-RoBERTa (base) fine-tuned on Hello-SimpleAI HC3 corpus for ChatGPT text detection.
All credit to Hello-SimpleAI for their huge work!
F1 score on test dataset: 0.9736
The model
XLM-RoBERTa model pre-trained on 2.5TB of filtered CommonCrawl data containing 100 languages. It was introduced in the paper Unsupervised Cross-lingual Representation Learning at Scale by Conneau et al. and first released in this repository.
The dataset
Human ChatGPT Comparison Corpus (HC3)
The first human-ChatGPT comparison corpus, named HC3 dataset by Hello-SimpleAI
This dataset is introduced in the paper:
Metrics
metric | value |
---|---|
F1 | 0.9736 |
Usage
from transformers import pipeline
ckpt = "mrm8488/xlm-roberta-base-finetuned-HC3-mix"
detector = pipeline('text-classification', model=ckpt)
text = "Here your text..."
result = detector(text)
print(result)
Citation
@misc {manuel_romero_2023,
author = { {Manuel Romero} },
title = { xlm-roberta-base-finetuned-HC3-mix (Revision b18de48) },
year = 2023,
url = { https://huggingface.co/mrm8488/xlm-roberta-base-finetuned-HC3-mix },
doi = { 10.57967/hf/0306 },
publisher = { Hugging Face }
}
- Downloads last month
- 127
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.