Papers
arxiv:2305.08596

DarkBERT: A Language Model for the Dark Side of the Internet

Published on May 15, 2023
· Submitted by akhaliq on May 16, 2023
#2 Paper of the day
Authors:
,
,
,

Abstract

Recent research has suggested that there are clear differences in the language used in the Dark Web compared to that of the Surface Web. As studies on the Dark Web commonly require textual analysis of the domain, language models specific to the Dark Web may provide valuable insights to researchers. In this work, we introduce DarkBERT, a language model pretrained on Dark Web data. We describe the steps taken to filter and compile the text data used to train DarkBERT to combat the extreme lexical and structural diversity of the Dark Web that may be detrimental to building a proper representation of the domain. We evaluate DarkBERT and its vanilla counterpart along with other widely used language models to validate the benefits that a Dark Web domain specific model offers in various use cases. Our evaluations show that DarkBERT outperforms current language models and may serve as a valuable resource for future research on the Dark Web.

Community

Háblame sobre el mal

Do you plan to release the dataset?

Hello, do you plan to make the model available?

Ed

How to make bot

·

How to work on darkbert

Paper author

I wasn't aware that people were following DarkBERT on huggingface, my bad for not checking: but yes, we plan to release both the dataset used for the experiments conducted in DarkBERT and the model itself. There are several ethical implications on this, however, as stated on the paper. So we're still in the works of developing a consent form so that the model can be accessed only in the case of academic research. This process should probably be finalized by early July.

This comment has been hidden

Hi,

I am working on an investigation for my master's degree that uses a web crawler to spider tor sites searching for cybercrime patterns, are you going to make the model available soon? because I would like to have an AI module that uses DarkBert to increase precision. Thank you

·

Hello, I am an undergraduate CS major hoping to further my research specifically in dark web AI. I have some specific questions that I wanted to ask somebody already in the line of work that utilizes AI/ML, “web crawlers”, and other tech to scrape the dark web before I personally dive deeper into this field. Tbh I don't expect any response but if you have any spare time my school email is [email protected]

Paper author

Hello, you guys can request access to the model here: https://huggingface.co/s2w-ai/DarkBERT

Sorry, I think I used the wrong email address and not one from my Institution in the access request, is it possible to re-submit?

Thank you

Paper author

Yup, you can resubmit if necessary

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2305.08596 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2305.08596 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2305.08596 in a Space README.md to link it from this page.

Collections including this paper 4