introduction

This is an undergraduate course project in computer security.

The task is to fine tune the large model to achieve malicious network flow data detection.

base model

bert-base-uncased

dataset:

19kmunz/iot-23-preprocessed-minimumcolumns

example prompt:

8081 tcp S0 2 80 0
37215 tcp S0 2 80 0
52869 tcp S0 2 80 0
8080 tcp S0 2 80 0
80 tcp S0 2 80 0

The above are "malicious", which is "label_1".

67 udp S0 11 3608 0
0 icmp OTH 9 844 0
136 icmp OTH 3 216 0
0 icmp OTH 8 648 0
134 icmp OTH 2 96 0

The above are "Benign", which is "label_0".

accuracy

    Training Loss	Valid. Loss	Valid. Accur.
epoch					
1	0.288545	0.190351	0.929988
2	0.147658	0.154426	0.943510
3	0.108059	0.173112	0.943510
4	0.092468	0.161035	0.947416

MCC score: 0.816

The "Total MCC" refers to the Matthews Correlation Coefficient (MCC), typically used to assess the quality of predictions in binary classification problems.

The MCC value ranges from -1 to 1, where 1 signifies perfect predictions, 0 indicates predictions similar to random chance, and -1 denotes completely opposite predictions.

A model with an MCC value of 0.816 can be considered quite good. This value being close to 1 implies that the model has a high predictive capability and can classify samples with considerable accuracy. A higher MCC value closer to 1 indicates stronger predictive ability in the model.

In summary, an MCC value of 0.816 indicates that the model demonstrates a high level of accuracy and predictive capability in binary classification tasks.

Downloads last month
94
Safetensors
Model size
109M params
Tensor type
F32
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.

Dataset used to train GeorgeNhj/BERT_based_My_SecurityLLM