Safetensors
English
distilbert
phishing_url
adnanaman commited on
Commit
b7b3fc8
1 Parent(s): 5ed1495

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +51 -14
README.md CHANGED
@@ -1,15 +1,52 @@
 
 
 
 
 
1
  ---
2
- license: apache-2.0
3
- datasets:
4
- - Adnan-AI-Labs/CleanedBalancedPhishingUrls
5
- language:
6
- - en
7
- metrics:
8
- - accuracy
9
- base_model:
10
- - distilbert/distilbert-base-uncased
11
- tags:
12
- - phishing
13
- - phishing_url
14
- - classification
15
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Model Card for DistilBERT-PhishGuard
2
+
3
+ ## Model Overview
4
+ **DistilBERT-PhishGuard** is a phishing URL detection model based on DistilBERT, fine-tuned specifically for the task of identifying whether a URL is safe or phishing. This model is designed for real-time applications in web and email security, helping users identify malicious links.
5
+
6
  ---
7
+
8
+ ## Intended Use
9
+ - **Use Cases**: URL classification for phishing detection in emails, websites, and chat applications.
10
+ - **Limitations**: This model may have reduced accuracy with non-English URLs or heavily obfuscated links.
11
+ - **Intended Users**: Security researchers, application developers, and cybersecurity engineers.
12
+
13
+ ---
14
+
15
+ ## Model Details
16
+ - **Architecture**: DistilBERT for Sequence Classification
17
+ - **Language**: Primarily English
18
+ - **License**: Apache License 2.0
19
+ - **Dataset**: Trained on labeled phishing and safe URLs from public and proprietary sources.
20
+
21
+ ---
22
+
23
+
24
+ ## Usage
25
+ This model can be loaded and used with Hugging Face's `transformers` library:
26
+
27
+ ```python
28
+ from transformers import AutoTokenizer, AutoModelForSequenceClassification
29
+ import torch
30
+
31
+ # Load the model and tokenizer
32
+ tokenizer = AutoTokenizer.from_pretrained("your-username/DistilBERT-PhishGuard")
33
+ model = AutoModelForSequenceClassification.from_pretrained("your-username/DistilBERT-PhishGuard")
34
+
35
+ # Sample URL for classification
36
+ url = "http://example.com"
37
+ inputs = tokenizer(url, return_tensors="pt", truncation=True, max_length=256)
38
+ outputs = model(**inputs)
39
+ predictions = torch.argmax(outputs.logits, dim=-1)
40
+ print("Prediction:", "Phishing" if predictions.item() == 1 else "Safe")
41
+
42
+ ## Performance
43
+ The model achieves high accuracy across different chunks of training data, with performance metrics above 98% accuracy and an AUC close to or at 1.00 in later stages. This indicates robust and reliable phishing detection across varied datasets.
44
+
45
+ ## Limitations and Biases
46
+ The model's performance may degrade on URLs containing obfuscated or novel phishing techniques.
47
+ It may be less effective on non-English URLs and may need further fine-tuning for different languages or domain-specific URLs.
48
+
49
+ ### Contact and Support
50
+ For questions, improvements, or support, please contact us through the Hugging Face community or open an issue in the model repository.
51
+
52
+