GeorgeNhj/BERT_based_My_SecurityLLM

introduction

This is an undergraduate course project in computer security.

The task is to fine tune the large model to achieve malicious network flow data detection.

base model

bert-base-uncased

dataset:

19kmunz/iot-23-preprocessed-minimumcolumns

example prompt:

8081 tcp S0 2 80 0
37215 tcp S0 2 80 0
52869 tcp S0 2 80 0
8080 tcp S0 2 80 0
80 tcp S0 2 80 0

The above are "malicious", which is "label_1".

67 udp S0 11 3608 0
0 icmp OTH 9 844 0
136 icmp OTH 3 216 0
0 icmp OTH 8 648 0
134 icmp OTH 2 96 0

The above are "Benign", which is "label_0".

accuracy

    Training Loss	Valid. Loss	Valid. Accur.
epoch					
1	0.288545	0.190351	0.929988
2	0.147658	0.154426	0.943510
3	0.108059	0.173112	0.943510
4	0.092468	0.161035	0.947416

MCC score： 0.816

The "Total MCC" refers to the Matthews Correlation Coefficient (MCC), typically used to assess the quality of predictions in binary classification problems.

The MCC value ranges from -1 to 1, where 1 signifies perfect predictions, 0 indicates predictions similar to random chance, and -1 denotes completely opposite predictions.

A model with an MCC value of 0.816 can be considered quite good. This value being close to 1 implies that the model has a high predictive capability and can classify samples with considerable accuracy. A higher MCC value closer to 1 indicates stronger predictive ability in the model.

In summary, an MCC value of 0.816 indicates that the model demonstrates a high level of accuracy and predictive capability in binary classification tasks.

GeorgeNhj
/

BERT_based_My_SecurityLLM