library_name: transformers | |
license: other | |
base_model: meta-llama/Llama-3.2-3B | |
tags: | |
- llama-factory | |
- full | |
- generated_from_trainer | |
model-index: | |
- name: GuardReasoner 3B | |
results: [] | |
pipeline_tag: text-generation | |
# GuardReasoner 3B | |
This model is a fine-tuned version of [meta-llama/Llama-3.2-3B](https://huggingface.co/meta-llama/Llama-3.2-3B) via R-SFT and HS-DPO, as described in [GuardReasoner: Towards Reasoning-based LLM Safeguards](https://huggingface.co/papers/2501.18492). |