NitinBirur's picture
Update README.md
d4a4b84 verified
---
base_model:
- deepseek-ai/DeepSeek-R1-Distill-Llama-8B
---
# DeepSeek-R1-Distill-Llama-8B-ENK-Aligned
## Overview
**DeepSeek-R1-Distill-Llama-8B-ENK-Aligned** is a safety-aligned version of [`deepseek-ai/DeepSeek-R1-Distill-Llama-8B`](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-8B). It has been aligned using the **Enkrypt AI Safety Alignment dataset**, which was generated with the **SAGE** process:
> **SAGE-RT: Synthetic Alignment data Generation for Safety Evaluation and Red Teaming**
> Anurakt Kumar, Divyanshu Kumar, Jatan Loya, Nitin Aravind Birur, Tanay Baswa, Sahil Agarwal, Prashanth Harshangi (2024)
> [[arXiv:2408.11851]](https://arxiv.org/abs/2408.11851)
This alignment significantly **reduces toxicity, harmfulness, and jailbreak vulnerabilities** across various safety topics while **maintaining model performance**.
## Red Team Results
![Safety Comparison](assets/safety_comparison.png)
## Performance Results
| Model | MMLU-Pro Score |
|--------|----------------|
| DeepSeek-R1-Distill-Llama-8B (Base) | **44.71** |
| DeepSeek-R1-Distill-Llama-8B-ENK-Aligned | **46.43** |
## Training Configuration
The model was trained using the **SimPO (Simple Preference Optimization)** approach with the following hyperparameters:
```yaml
cpo_config:
loss_type: 'simpo'
max_prompt_length: 1800
max_length: 3600
per_device_train_batch_size: 8
gradient_accumulation_steps: 1
learning_rate: 1.8e-6
optim: 'adamw_torch'
lr_scheduler_type: 'cosine'
gradient_checkpointing: True
beta: 5
num_train_epochs: 1
bf16: False
simpo_gamma: 0.8
warmup_ratio: 0.1
cpo_alpha: 0.0
```
## Key Improvements
- **Enhanced Safety**: Significant reduction in harmful or toxic outputs.
- **Improved Robustness**: Stronger resistance to adversarial jailbreak prompts.
- **Minimal Performance Tradeoff**: Slight improvement in MMLU-Pro despite additional alignment constraints.
## Use Cases
This model is ideal for applications requiring **safe, aligned, and high-performance language generation**, including:
- **Conversational AI**: Ensuring responsible and aligned assistant behavior.
- **Content Moderation**: Filtering harmful content while maintaining contextual understanding.
- **Education & Research**: Deploying AI in sensitive environments with reduced risks.
<!-- ## Citation
If you use this model, please cite the SAGE-RT paper:
```bibtex
@misc{kumar2024sagertsyntheticalignmentdata,
title={SAGE-RT: Synthetic Alignment data Generation for Safety Evaluation and Red Teaming},
author={Anurakt Kumar and Divyanshu Kumar and Jatan Loya and Nitin Aravind Birur and Tanay Baswa and Sahil Agarwal and Prashanth Harshangi},
year={2024},
eprint={2408.11851},
archivePrefix={arXiv},
primaryClass={cs.AI},
url={https://arxiv.org/abs/2408.11851}
}
``` -->
---
For questions or contributions, reach out to the **Enkrypt AI** team!