nielsr HF Staff commited on
Commit
11628ef
·
verified ·
1 Parent(s): 8370845

Update pipeline tag and project page URL

Browse files

This PR improves the model card by:

* Updating the `pipeline_tag` from `text-generation` to `text-classification`, which more accurately reflects the model's core function of evaluating text against policies to classify compliance. This enhances discoverability for users looking for guardrail or text classification models on the Hugging Face Hub (https://huggingface.co/models?pipeline_tag=text-classification).
* Correcting the `project_page` URL in the metadata from the GitHub repository to the dedicated project page: `https://taruschirag.github.io/DynaGuard/`.

Files changed (1) hide show
  1. README.md +17 -16
README.md CHANGED
@@ -1,8 +1,12 @@
1
  ---
2
- license: apache-2.0
 
 
 
3
  language: en
4
  library_name: transformers
5
- pipeline_tag: text-generation
 
6
  tags:
7
  - guardrail
8
  - safety
@@ -11,13 +15,9 @@ tags:
11
  - umd
12
  - qwen3
13
  - llm
14
- datasets:
15
- - tomg-group-umd/DynaBench
16
- base_model:
17
- - Qwen/Qwen3-1.7B
18
  repo_url: https://github.com/montehoover/DynaGuard
19
  paper_url: https://arxiv.org/abs/2509.02563
20
- project_page: https://github.com/taruschirag/DynaGuard
21
  ---
22
 
23
  # DynaGuard-1.7B 🛡️
@@ -34,17 +34,17 @@ The DynaGuard series achieves state-of-the-art performance across a wide range o
34
 
35
  ## Model Details
36
 
37
- * **Developed by:** University of Maryland, Capital One
38
- * **Base Model:** [Qwen3-1.7B](https://huggingface.co/Qwen/Qwen3-1.7B)
39
- * **Model Type:** Decoder-only Transformer
40
- * **Training Data:** Fine-tuned on a mixture of the **[DynaBench](https://huggingface.co/tomg-group-umd/DynaBench)** dataset and several safety benchmarks (WildGuard, BeaverTails, ToxicChat, Aegis 2.0).
41
- * **Training Procedure:** The model was trained using Supervised Fine-Tuning (SFT) for one epoch, followed by GRPO.
42
 
43
  ### Key Features
44
 
45
- * **Dynamic Policies:** Accepts arbitrary guardrail policies written in natural language, allowing for bespoke and application-specific moderation.
46
- * **Interpretability:** Can generate detailed, natural-language explanations for why a policy was violated, enabling chatbot recovery and human-in-the-loop refinement.
47
- * **Dual-Mode Inference:** Supports two modes for flexibility:
48
  1. **Fast Inference:** Provides a direct `PASS` or `FAIL` classification for minimal latency.
49
  2. **Chain-of-Thought (CoT):** Generates a reasoning trace before giving the final classification, offering interpretability.
50
 
@@ -108,7 +108,8 @@ Evaluate the following dialogue for compliance with the given policy. Provide th
108
  """
109
  inputs = tokenizer(fast_prompt, return_tensors="pt").to(model.device)
110
  outputs = model.generate(**inputs, max_new_tokens=100, temperature=0.1)
111
- print("\n--- Fast Inference Mode Output ---")
 
112
  print(tokenizer.decode(outputs[0], skip_special_tokens=True))
113
  ```
114
 
 
1
  ---
2
+ base_model:
3
+ - Qwen/Qwen3-1.7B
4
+ datasets:
5
+ - tomg-group-umd/DynaBench
6
  language: en
7
  library_name: transformers
8
+ license: apache-2.0
9
+ pipeline_tag: text-classification
10
  tags:
11
  - guardrail
12
  - safety
 
15
  - umd
16
  - qwen3
17
  - llm
 
 
 
 
18
  repo_url: https://github.com/montehoover/DynaGuard
19
  paper_url: https://arxiv.org/abs/2509.02563
20
+ project_page: https://taruschirag.github.io/DynaGuard/
21
  ---
22
 
23
  # DynaGuard-1.7B 🛡️
 
34
 
35
  ## Model Details
36
 
37
+ * **Developed by:** University of Maryland, Capital One
38
+ * **Base Model:** [Qwen3-1.7B](https://huggingface.co/Qwen/Qwen3-1.7B)
39
+ * **Model Type:** Decoder-only Transformer
40
+ * **Training Data:** Fine-tuned on a mixture of the **[DynaBench](https://huggingface.co/tomg-group-umd/DynaBench)** dataset and several safety benchmarks (WildGuard, BeaverTails, ToxicChat, Aegis 2.0).
41
+ * **Training Procedure:** The model was trained using Supervised Fine-Tuning (SFT) for one epoch, followed by GRPO.
42
 
43
  ### Key Features
44
 
45
+ * **Dynamic Policies:** Accepts arbitrary guardrail policies written in natural language, allowing for bespoke and application-specific moderation.
46
+ * **Interpretability:** Can generate detailed, natural-language explanations for why a policy was violated, enabling chatbot recovery and human-in-the-loop refinement.
47
+ * **Dual-Mode Inference:** Supports two modes for flexibility:
48
  1. **Fast Inference:** Provides a direct `PASS` or `FAIL` classification for minimal latency.
49
  2. **Chain-of-Thought (CoT):** Generates a reasoning trace before giving the final classification, offering interpretability.
50
 
 
108
  """
109
  inputs = tokenizer(fast_prompt, return_tensors="pt").to(model.device)
110
  outputs = model.generate(**inputs, max_new_tokens=100, temperature=0.1)
111
+ print("
112
+ --- Fast Inference Mode Output ---")
113
  print(tokenizer.decode(outputs[0], skip_special_tokens=True))
114
  ```
115