Are AI Detectors Good Enough? A Survey on Quality of Datasets With Machine-Generated Texts
Abstract
The rapid development of autoregressive Large Language Models (LLMs) has significantly improved the quality of generated texts, necessitating reliable machine-generated text detectors. A huge number of detectors and collections with AI fragments have emerged, and several detection methods even showed recognition quality up to 99.9% according to the target metrics in such collections. However, the quality of such detectors tends to drop dramatically in the wild, posing a question: Are detectors actually highly trustworthy or do their high benchmark scores come from the poor quality of evaluation datasets? In this paper, we emphasise the need for robust and qualitative methods for evaluating generated data to be secure against bias and low generalising ability of future model. We present a systematic review of datasets from competitions dedicated to AI-generated content detection and propose methods for evaluating the quality of datasets containing AI-generated fragments. In addition, we discuss the possibility of using high-quality generated data to achieve two goals: improving the training of detection models and improving the training datasets themselves. Our contribution aims to facilitate a better understanding of the dynamics between human and machine text, which will ultimately support the integrity of information in an increasingly automated world.
Community
In our new paper we demonstrated that AI-detection shared tasks and research papers datasets are inadequate for evaluation of AI detectors, resulting in systematic errors that inflate the detectors' quality scores.
What are the detectors now for text and code
What are the detectors now for text and code
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Zero-Shot Machine-Generated Text Detection Using Mixture of Large Language Models (2024)
- Robust AI-Generated Text Detection by Restricted Embeddings (2024)
- Training-free LLM-generated Text Detection by Mining Token Probability Sequences (2024)
- LLM Detectors Still Fall Short of Real World: Case of LLM-Generated Short News-Like Posts (2024)
- Zero-Shot Detection of LLM-Generated Text using Token Cohesiveness (2024)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
We focus on plain text. I think code generation detection is a harder task because of the lack of coherence in the text representation. Code is like a list of statements that do not form a single sequence (even though they solve the same problem).
Also, testing code generation is a less popular task now, and there are not many datasets for it and competitions at leading ML conferences. Even for detection text generation, when we have a lot of data, we show that this data is not good enough to test detectors.
Plain text detectors are very popular now, but they are all built base on Berth-like models or on the analysis of the frequency of occurrence of words. We mention some of the state-of-the-art detection models in our article.
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper