nielsr HF Staff commited on
Commit
a3e7316
·
verified ·
1 Parent(s): c1bfd40

Update `pipeline_tag` and correct `project_page` URL in metadata

Browse files

This PR improves the model card metadata by:

- Changing the `pipeline_tag` from `text-generation` to `text-classification`. This more accurately reflects the model's core functionality of evaluating text for policy compliance and providing a classification (PASS/FAIL), ensuring better discoverability at https://huggingface.co/models?pipeline_tag=text-classification.
- Correcting the `project_page` URL from `https://github.com/taruschirag/DynaGuard` to `https://taruschirag.github.io/DynaGuard/` to point to the official project website, ensuring consistency and accuracy.

No changes are made to the markdown content as it is already comprehensive and well-structured.

Files changed (1) hide show
  1. README.md +17 -16
README.md CHANGED
@@ -1,8 +1,12 @@
1
  ---
2
- license: apache-2.0
 
 
 
3
  language: en
4
  library_name: transformers
5
- pipeline_tag: text-generation
 
6
  tags:
7
  - guardrail
8
  - safety
@@ -11,13 +15,9 @@ tags:
11
  - umd
12
  - qwen3
13
  - llm
14
- datasets:
15
- - tomg-group-umd/DynaBench
16
- base_model:
17
- - Qwen/Qwen3-4B
18
  repo_url: https://github.com/montehoover/DynaGuard
19
  paper_url: https://arxiv.org/abs/2509.02563
20
- project_page: https://github.com/taruschirag/DynaGuard
21
  ---
22
 
23
  # DynaGuard-4B 🛡️
@@ -35,17 +35,17 @@ The DynaGuard series achieves state-of-the-art performance across a wide range o
35
 
36
  ## Model Details
37
 
38
- * **Developed by:** University of Maryland, Capital One
39
- * **Base Model:** [Qwen3-4B](https://huggingface.co/Qwen/Qwen3-4B)
40
- * **Model Type:** Decoder-only Transformer
41
- * **Training Data:** Fine-tuned on a mixture of the **[DynaBench](https://huggingface.co/tomg-group-umd/DynaBench)** dataset and several safety benchmarks (WildGuard, BeaverTails, ToxicChat, Aegis 2.0).
42
- * **Training Procedure:** The model was trained using Supervised Fine-Tuning (SFT) for one epoch, followed by GRPO.
43
 
44
  ### Key Features
45
 
46
- * **Dynamic Policies:** Accepts arbitrary guardrail policies written in natural language, allowing for bespoke and application-specific moderation.
47
- * **Interpretability:** Can generate detailed, natural-language explanations for why a policy was violated, enabling chatbot recovery and human-in-the-loop refinement.
48
- * **Dual-Mode Inference:** Supports two modes for flexibility:
49
  1. **Fast Inference:** Provides a direct `PASS` or `FAIL` classification for minimal latency.
50
  2. **Chain-of-Thought (CoT):** Generates a reasoning trace before giving the final classification, offering interpretability.
51
 
@@ -109,7 +109,8 @@ Evaluate the following dialogue for compliance with the given policy. Provide th
109
  """
110
  inputs = tokenizer(fast_prompt, return_tensors="pt").to(model.device)
111
  outputs = model.generate(**inputs, max_new_tokens=100, temperature=0.1)
112
- print("\n--- Fast Inference Mode Output ---")
 
113
  print(tokenizer.decode(outputs[0], skip_special_tokens=True))
114
  ```
115
 
 
1
  ---
2
+ base_model:
3
+ - Qwen/Qwen3-4B
4
+ datasets:
5
+ - tomg-group-umd/DynaBench
6
  language: en
7
  library_name: transformers
8
+ license: apache-2.0
9
+ pipeline_tag: text-classification
10
  tags:
11
  - guardrail
12
  - safety
 
15
  - umd
16
  - qwen3
17
  - llm
 
 
 
 
18
  repo_url: https://github.com/montehoover/DynaGuard
19
  paper_url: https://arxiv.org/abs/2509.02563
20
+ project_page: https://taruschirag.github.io/DynaGuard/
21
  ---
22
 
23
  # DynaGuard-4B 🛡️
 
35
 
36
  ## Model Details
37
 
38
+ * **Developed by:** University of Maryland, Capital One
39
+ * **Base Model:** [Qwen3-4B](https://huggingface.co/Qwen/Qwen3-4B)
40
+ * **Model Type:** Decoder-only Transformer
41
+ * **Training Data:** Fine-tuned on a mixture of the **[DynaBench](https://huggingface.co/tomg-group-umd/DynaBench)** dataset and several safety benchmarks (WildGuard, BeaverTails, ToxicChat, Aegis 2.0).
42
+ * **Training Procedure:** The model was trained using Supervised Fine-Tuning (SFT) for one epoch, followed by GRPO.
43
 
44
  ### Key Features
45
 
46
+ * **Dynamic Policies:** Accepts arbitrary guardrail policies written in natural language, allowing for bespoke and application-specific moderation.
47
+ * **Interpretability:** Can generate detailed, natural-language explanations for why a policy was violated, enabling chatbot recovery and human-in-the-loop refinement.
48
+ * **Dual-Mode Inference:** Supports two modes for flexibility:
49
  1. **Fast Inference:** Provides a direct `PASS` or `FAIL` classification for minimal latency.
50
  2. **Chain-of-Thought (CoT):** Generates a reasoning trace before giving the final classification, offering interpretability.
51
 
 
109
  """
110
  inputs = tokenizer(fast_prompt, return_tensors="pt").to(model.device)
111
  outputs = model.generate(**inputs, max_new_tokens=100, temperature=0.1)
112
+ print("
113
+ --- Fast Inference Mode Output ---")
114
  print(tokenizer.decode(outputs[0], skip_special_tokens=True))
115
  ```
116