URLBERT-Tiny-v3 Malicious URL Classifier

This is a lightweight version of BERT, specifically fine-tuned for classifying URLs into four categories: benign, phishing, malware, and defacement.

Model Details

Model Evaluation Results

The model was evaluated on a test set with the following classification metrics:

Class Precision Recall F1-Score
Benign 0.987695 0.993717 0.990697
Defacement 0.988510 0.998963 0.993709
Malware 0.988291 0.960332 0.974111
Phishing 0.958425 0.930826 0.944423
Accuracy 0.983738 0.983738 0.983738
Macro Avg 0.980730 0.970959 0.975735
Weighted Avg 0.983615 0.983738 0.983627

Usage Example

Below is an example of how to use the model for URL classification using the Hugging Face transformers library:

from transformers import BertTokenizerFast, BertForSequenceClassification, pipeline
import torch

# Определение устройства (GPU или CPU)
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f"Используемое устройство: {device}")

# Загрузка модели и токенизатора
model_name = "CrabInHoney/urlbert-tiny-v3-malicious-url-classifier"
tokenizer = BertTokenizerFast.from_pretrained(model_name)
model = BertForSequenceClassification.from_pretrained(model_name)
model.to(device)

# Создание pipeline для классификации
classifier = pipeline(
    "text-classification",
    model=model,
    tokenizer=tokenizer,
    device=0 if torch.cuda.is_available() else -1,
    return_all_scores=True
)

# Примеры URL для тестирования
test_urls = [
    "wikiobits.com/Obits/TonyProudfoot",
    "http://www.824555.com/app/member/SportOption.php?uid=guest&langx=gb",
]

# Маппинг меток на понятные названия классов
label_mapping = {
    "LABEL_0": "benign",
    "LABEL_1": "defacement",
    "LABEL_2": "malware",
    "LABEL_3": "phishing"
}

# Классификация URL
for url in test_urls:
    results = classifier(url)
    print(f"\nURL: {url}")
    for result in results[0]: 
        label = result['label']
        score = result['score']
        friendly_label = label_mapping.get(label, label)
        print(f"Класс: {friendly_label}, вероятность: {score:.4f}")

Example Output:

URL: wikiobits.com/Obits/TonyProudfoot
Класс: benign, вероятность: 0.9953
Класс: defacement, вероятность: 0.0000
Класс: malware, вероятность: 0.0000
Класс: phishing, вероятность: 0.0046

URL: http://www.824555.com/app/member/SportOption.php?uid=guest&langx=gb
Класс: benign, вероятность: 0.0000
Класс: defacement, вероятность: 0.0001
Класс: malware, вероятность: 0.9998
Класс: phishing, вероятность: 0.0001
Downloads last month
129
Safetensors
Model size
3.69M params
Tensor type
F32
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.

Model tree for CrabInHoney/urlbert-tiny-v3-malicious-url-classifier

Finetuned
(2)
this model

Collection including CrabInHoney/urlbert-tiny-v3-malicious-url-classifier