Text Classification
Transformers
Safetensors
code
cybersecurity
vulnerability
cpp
Inference Endpoints

Model Card for ThreatDetect-C-Cpp

This is a derivative version of answerdotai/ModernBERT-base.
We fine-tuned ModernBERT-base to detect vulnerability in C/C++ Code.
The actual version has an accuracy of 86%

Model Details

Model Description

ThreatDetect-C-Cpp can be used as a code classifier.
Instead of binary classification ("safe", "unsafe"), The model classifies the input code into 7 labels: 'safe' (no vulnerability detected) and six other CWE weaknesses:

Label Description
CWE-119 Improper Restriction of Operations within the Bounds of a Memory Buffer
CWE-125 Out-of-bounds Read
CWE-20 Improper Input Validation
CWE-416 Use After Free
CWE-703 Improper Check or Handling of Exceptional Conditions
CWE-787 Out-of-bounds Write
safe Safe code

Model Sources [optional]

Uses

ThreadDetect-C-Cpp can be integrated in code-related applications. For example, it can be used in pair with a code generator to detect vulnerabilities in the generated code.

Bias, Risks, and Limitations

ThreadDetect-C-Cpp can detect weaknesses in C/C++ code only. It should not be used with other programming languages.
The model can only detect the six CWEs in the table above.

Training Details

Training Data

The model was fine-tuned on a minified, clean and deduplicated version of DiverseVul dataset.
This new version can be explored on HF datasets HERE

Training Procedure

The model was trained using LoRA applied to Q and V matrices.

Training Hyperparameters

Hyperparameter Value
Max Sequence Length 600
Batch Size 32
Number of Epochs 9
Learning Rate 5e-4
Weight Decay 0.01
Logging Steps 100
LoRA Rank (r) 8
LoRA Alpha 32
LoRA Dropout 0.1
LoRA Target Modules attn.Wqkv
Optimizer AdamW
LR Scheduler CosineAnnealingWarmRestarts
Scheduler T_0 10
Scheduler T_mult 2
Scheduler eta_min 1e-6
Training Split Ratio 90% Train / 10% Validation
Seed for Splitting 42

Evaluation

ThreatDetect-C-Cpp reaches an accruacy of 86% on the eval set.

Technical Specifications

Hardware

The model was fine-tuned on 4 Tesla V100 GPUs for 1 hour using torch + accelerate frameworks.

Downloads last month
5
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.

Model tree for lemon42-ai/ThreatDetect-C-Cpp

Finetuned
(360)
this model

Dataset used to train lemon42-ai/ThreatDetect-C-Cpp