metadata
language:
- en
pipeline_tag: text-classification
base_model: DunnBC22/codebert-base-Malicious_URLs
inference: false
datasets:
- sid321axn/malicious-urls-dataset
tags:
- malicious-urls
- url
ONNX version of DunnBC22/codebert-base-Malicious_URLs
This model is a conversion of DunnBC22/codebert-base-Malicious_URLs to ONNX format. It's based on the CodeBERT architecture, tailored for the specific task of identifying URLs that may pose security threats. The model was converted to ONNX using the 🤗 Optimum library.
Model Architecture
Base Model: CodeBERT-base, a robust model for programming and natural languages.
Dataset: https://www.kaggle.com/datasets/sid321axn/malicious-urls-dataset.
Modifications: Details of any modifications or fine-tuning done to tailor the model for malicious URL detection.
Usage
Loading the model requires the 🤗 Optimum library installed.
from optimum.onnxruntime import ORTModelForSequenceClassification
from transformers import AutoTokenizer, pipeline
tokenizer = AutoTokenizer.from_pretrained("laiyer/codebert-base-Malicious_URLs-onnx")
model = ORTModelForSequenceClassification.from_pretrained("laiyer/codebert-base-Malicious_URLs-onnx")
classifier = pipeline(
task="text-classification",
model=model,
tokenizer=tokenizer,
top_k=None,
)
classifier_output = classifier("https://google.com")
print(classifier_output)
LLM Guard
Community
Join our Slack to give us feedback, connect with the maintainers and fellow users, ask questions, or engage in discussions about LLM security!