CrabInHoney/urlbert-tiny-v3-phishing-classifier · this is fine tuned or Pre-tuned ?

6 days ago

is this bert model Pretrained on URL data or its just BERT base fine tuned using URL corpus data?

Owner 6 days ago

My model is not based on the standard pre-trained BERT base. Instead, I initialized BERT from scratch and trained it using the Masked Language Model task on a corpus of ~1,000,000 unlabeled URL data. As a result, the model CrabInHoney/urlbert-tiny-base-v3 was created, which is a pre-trained model specific to URL data.

Then, I fine-tuned this model on labeled URL data for the task of phishing URL classification, resulting in the creation of the model CrabInHoney/urlbert-tiny-v3-phishing-classifier. Thus, urlbert-tiny-base-v3 is a pre-trained model on URL data, while urlbert-tiny-v3-phishing-classifier is a fine-tuned model for the specific task of classification.

CrabInHoney changed discussion status to closed 5 days ago