tomg-group-umd
/

DynaGuard-1.7B

@@ -15,6 +15,9 @@ datasets:
 - tomg-group-umd/DynaBench
 base_model:
 - Qwen/Qwen3-1.7B
 ---
 # DynaGuard-1.7B 🛡️
@@ -22,9 +25,11 @@ base_model:
 **The DynaGuard model series** is a family of guardian models designed to evaluate text against user-defined, natural language policies. They provide a flexible and powerful solution for moderating chatbot outputs beyond static, predefined harm categories. Developed by researchers at the University of Maryland and Capital One , the series includes three open-weight models of varying sizes:
 1.7B, 4B, and 8B — allowing developers to choose the best balance of performance and efficiency for their needs.
 Unlike traditional guardian models that screen for a fixed set of harms (e.g., violence or self-harm) , DynaGuard can enforce bespoke, application-specific rules. This includes scenarios like preventing a customer service bot from mistakenly issuing refunds or ensuring a medical bot avoids giving unauthorized advice.
-The DynaGuard series achieves state-of-the-art performance across a wide range of safety and compliance benchmarks, with the flagship **[DynaGuard-8B](https://huggingface.co/tomg-group-umd/DynaGuard-8B)** model outperforming other guardian models and even strong generalist models like GPT-4o-mini.
-[https://arxiv.org/abs/2509.02563](https://arxiv.org/abs/2509.02563)
 ## Model Details
@@ -124,6 +129,10 @@ DynaGuard achieves state-of-the-art performance, outperforming other dedicated g
 If you use DynaGuard or the DynaBench dataset in your research, please cite our work:
 ```
 @article{hoover2025dynaguard,
-  title={DynaGuard: A Dynamic Guardrail Model With User-Defined Policies},
 }
 ```

 - tomg-group-umd/DynaBench
 base_model:
 - Qwen/Qwen3-1.7B
+repo_url: https://github.com/montehoover/DynaGuard
+paper_url: https://arxiv.org/abs/2509.02563
+project_page: https://github.com/taruschirag/DynaGuard
 ---
 # DynaGuard-1.7B 🛡️
 **The DynaGuard model series** is a family of guardian models designed to evaluate text against user-defined, natural language policies. They provide a flexible and powerful solution for moderating chatbot outputs beyond static, predefined harm categories. Developed by researchers at the University of Maryland and Capital One , the series includes three open-weight models of varying sizes:
 1.7B, 4B, and 8B — allowing developers to choose the best balance of performance and efficiency for their needs.
 Unlike traditional guardian models that screen for a fixed set of harms (e.g., violence or self-harm) , DynaGuard can enforce bespoke, application-specific rules. This includes scenarios like preventing a customer service bot from mistakenly issuing refunds or ensuring a medical bot avoids giving unauthorized advice.
+The DynaGuard series achieves state-of-the-art performance across a wide range of safety and compliance benchmarks, with the flagship **DynaGuard-8B** model outperforming other guardian models and even strong generalist models like GPT-4o-mini.
+| 🔖 | 💻 | 🌐 |
+|----|----|---|
+| [Paper (arXiv)](https://arxiv.org/abs/2509.02563) | [Code (GitHub)](https://github.com/montehoover/DynaGuard) | [Project page ](https://github.com/taruschirag/DynaGuard) |
 ## Model Details
 If you use DynaGuard or the DynaBench dataset in your research, please cite our work:
 ```
 @article{hoover2025dynaguard,
+    title={DynaGuard: A Dynamic Guardrail Model With User-Defined Policies},
+    author={Monte Hoover and Vatsal Baherwani and Neel Jain and Khalid Saifullah and Joseph Vincent and Chirag Jain and Melissa Kazemi Rad and C. Bayan Bruss and Ashwinee Panda and Tom Goldstein},
+    journal={arXiv preprint},
+    year={2025},
+    url={https://arxiv.org/abs/2509.02563},
 }
 ```