tomg-group-umd
/

DynaGuard-8B

@@ -1,8 +1,12 @@
 ---
-license: apache-2.0
 language: en
 library_name: transformers
-pipeline_tag: text-generation
 tags:
 - guardrail
 - safety
@@ -11,13 +15,9 @@ tags:
 - umd
 - qwen3
 - llm
-datasets:
-- tomg-group-umd/DynaBench
-base_model:
-- Qwen/Qwen3-8B
 repo_url: https://github.com/montehoover/DynaGuard
 paper_url: https://arxiv.org/abs/2509.02563
-project_page: https://github.com/taruschirag/DynaGuard
 ---
 # DynaGuard-8B 🛡️
@@ -34,17 +34,17 @@ The DynaGuard series achieves state-of-the-art performance across a wide range o
 ## Model Details
-* **Developed by:** University of Maryland, Capital One
-* **Base Model:** [Qwen3-8B](https://huggingface.co/Qwen/Qwen3-8B)
-* **Model Type:** Decoder-only Transformer
-* **Training Data:** Fine-tuned on a mixture of the **[DynaBench](https://huggingface.co/tomg-group-umd/DynaBench)** dataset and several safety benchmarks (WildGuard, BeaverTails, ToxicChat, Aegis 2.0).
-* **Training Procedure:** The model was trained using Supervised Fine-Tuning (SFT) for one epoch, followed by GRPO.
 ### Key Features
-* **Dynamic Policies:** Accepts arbitrary guardrail policies written in natural language, allowing for bespoke and application-specific moderation.
-* **Interpretability:** Can generate detailed, natural-language explanations for why a policy was violated, enabling chatbot recovery and human-in-the-loop refinement.
-* **Dual-Mode Inference:** Supports two modes for flexibility:
     1.  **Fast Inference:** Provides a direct `PASS` or `FAIL` classification for minimal latency.
     2.  **Chain-of-Thought (CoT):** Generates a reasoning trace before giving the final classification, offering interpretability.
@@ -108,7 +108,8 @@ Evaluate the following dialogue for compliance with the given policy. Provide th
 """
 inputs = tokenizer(fast_prompt, return_tensors="pt").to(model.device)
 outputs = model.generate(**inputs, max_new_tokens=100, temperature=0.1)
-print("\n--- Fast Inference Mode Output ---")
 print(tokenizer.decode(outputs[0], skip_special_tokens=True))
 ```

 ---
+base_model:
+- Qwen/Qwen3-8B
+datasets:
+- tomg-group-umd/DynaBench
 language: en
 library_name: transformers
+license: apache-2.0
+pipeline_tag: text-classification
 tags:
 - guardrail
 - safety
 - umd
 - qwen3
 - llm
 repo_url: https://github.com/montehoover/DynaGuard
 paper_url: https://arxiv.org/abs/2509.02563
+project_page: https://taruschirag.github.io/DynaGuard/
 ---
 # DynaGuard-8B 🛡️
 ## Model Details
+*   **Developed by:** University of Maryland, Capital One
+*   **Base Model:** [Qwen3-8B](https://huggingface.co/Qwen/Qwen3-8B)
+*   **Model Type:** Decoder-only Transformer
+*   **Training Data:** Fine-tuned on a mixture of the **[DynaBench](https://huggingface.co/tomg-group-umd/DynaBench)** dataset and several safety benchmarks (WildGuard, BeaverTails, ToxicChat, Aegis 2.0).
+*   **Training Procedure:** The model was trained using Supervised Fine-Tuning (SFT) for one epoch, followed by GRPO.
 ### Key Features
+*   **Dynamic Policies:** Accepts arbitrary guardrail policies written in natural language, allowing for bespoke and application-specific moderation.
+*   **Interpretability:** Can generate detailed, natural-language explanations for why a policy was violated, enabling chatbot recovery and human-in-the-loop refinement.
+*   **Dual-Mode Inference:** Supports two modes for flexibility:
     1.  **Fast Inference:** Provides a direct `PASS` or `FAIL` classification for minimal latency.
     2.  **Chain-of-Thought (CoT):** Generates a reasoning trace before giving the final classification, offering interpretability.
 """
 inputs = tokenizer(fast_prompt, return_tensors="pt").to(model.device)
 outputs = model.generate(**inputs, max_new_tokens=100, temperature=0.1)
+print("
+--- Fast Inference Mode Output ---")
 print(tokenizer.decode(outputs[0], skip_special_tokens=True))
 ```