AllanK24
/

modernbert-Aegis-Content-Safety-2.0

@@ -19,9 +19,9 @@ should probably proofread and complete it, then remove this comment. -->
 This model is a fine-tuned version of [answerdotai/ModernBERT-base](https://huggingface.co/answerdotai/ModernBERT-base) on an unknown dataset.
 It achieves the following results on the evaluation set:
-- Loss: 0.4752
-- Accuracy: 0.8145
-- F1: 0.8531
 ## Model description
@@ -40,34 +40,36 @@ More information needed
 ### Training hyperparameters
 The following hyperparameters were used during training:
-- learning_rate: 0.001
-- train_batch_size: 16
-- eval_batch_size: 16
 - seed: 42
 - distributed_type: multi-GPU
 - num_devices: 2
-- total_train_batch_size: 32
-- total_eval_batch_size: 32
-- optimizer: Use adamw_torch_fused with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
 - lr_scheduler_type: linear
 - num_epochs: 5
 - mixed_precision_training: Native AMP
 - label_smoothing_factor: 0.1
 ### Training results
-| Training Loss | Epoch | Step | Validation Loss | Accuracy | F1     |
-|:-------------:|:-----:|:----:|:---------------:|:--------:|:------:|
-| 0.5192        | 1.0   | 938  | 0.4865          | 0.8090   | 0.8537 |
-| 0.4965        | 2.0   | 1876 | 0.4769          | 0.8180   | 0.8575 |
-| 0.4879        | 3.0   | 2814 | 0.4783          | 0.8069   | 0.8399 |
-| 0.4834        | 4.0   | 3752 | 0.4790          | 0.8131   | 0.8450 |
-| 0.479         | 5.0   | 4690 | 0.4752          | 0.8145   | 0.8531 |
 ### Framework versions
-- Transformers 4.50.1
 - Pytorch 2.6.0+cu126
 - Datasets 3.3.1
 - Tokenizers 0.21.0

 This model is a fine-tuned version of [answerdotai/ModernBERT-base](https://huggingface.co/answerdotai/ModernBERT-base) on an unknown dataset.
 It achieves the following results on the evaluation set:
+- Loss: 0.5534
+- Accuracy: 0.8388
+- F1: 0.8635
 ## Model description
 ### Training hyperparameters
 The following hyperparameters were used during training:
+- learning_rate: 5e-05
+- train_batch_size: 2
+- eval_batch_size: 2
 - seed: 42
 - distributed_type: multi-GPU
 - num_devices: 2
+- gradient_accumulation_steps: 4
+- total_train_batch_size: 16
+- total_eval_batch_size: 4
+- optimizer: Use adamw_torch_fused with betas=(0.9,0.98) and epsilon=1e-06 and optimizer_args=No additional optimizer arguments
 - lr_scheduler_type: linear
+- lr_scheduler_warmup_ratio: 0.1
 - num_epochs: 5
 - mixed_precision_training: Native AMP
 - label_smoothing_factor: 0.1
 ### Training results
+| Training Loss | Epoch  | Step | Validation Loss | Accuracy | F1     |
+|:-------------:|:------:|:----:|:---------------:|:--------:|:------:|
+| 1.9225        | 1.0    | 1876 | 0.4365          | 0.8401   | 0.8716 |
+| 1.4566        | 2.0    | 3752 | 0.4588          | 0.8318   | 0.8601 |
+| 1.155         | 3.0    | 5628 | 0.5355          | 0.8304   | 0.8511 |
+| 1.0333        | 4.0    | 7504 | 0.5601          | 0.8311   | 0.8501 |
+| 0.9694        | 4.9976 | 9375 | 0.5534          | 0.8388   | 0.8635 |
 ### Framework versions
+- Transformers 4.50.2
 - Pytorch 2.6.0+cu126
 - Datasets 3.3.1
 - Tokenizers 0.21.0

model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:e9a28c1836c74f139def7cab94788f2b7bd63832bf01c6e16a81d45081ad52cb
 size 598439784

 version https://git-lfs.github.com/spec/v1
+oid sha256:95ed7142123a657dde68253257d449487a62d4feef0a0ad819a6bdc99c1f5ef4
 size 598439784

tokenizer.json CHANGED Viewed

@@ -1,6 +1,11 @@
 {
   "version": "1.0",
-  "truncation": null,
   "padding": null,
   "added_tokens": [
     {

 {
   "version": "1.0",
+  "truncation": {
+    "direction": "Right",
+    "max_length": 8192,
+    "strategy": "LongestFirst",
+    "stride": 0
+  },
   "padding": null,
   "added_tokens": [
     {