GuardReasoner-3B / README.md

nielsr HF staff

Add paper link, correct pipeline tag

bda127c verified 24 days ago

preview code

raw

history blame

491 Bytes

metadata

library_name: transformers
license: other
base_model: meta-llama/Llama-3.2-3B
tags:
  - llama-factory
  - full
  - generated_from_trainer
model-index:
  - name: GuardReasoner 3B
    results: []
pipeline_tag: text-generation

GuardReasoner 3B

This model is a fine-tuned version of meta-llama/Llama-3.2-3B via R-SFT and HS-DPO, as described in GuardReasoner: Towards Reasoning-based LLM Safeguards.