youngwan657 commited on
Commit
9acb528
·
verified ·
1 Parent(s): e00dd13

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +13 -5
README.md CHANGED
@@ -11,13 +11,20 @@ pipeline_tag: text-classification
11
 
12
  # RoGuard: Advancing Safety for LLMs with Robust Guardrails
13
 
14
- # Model Card for Model ID
 
 
 
15
 
16
- <!-- Provide a quick summary of what the model is/does. -->
17
- RoGuard is a lightweight, modular evaluation framework for assessing the safety of fine-tuned language models. It provides structured evaluation using configurable prompts, labeled datasets, and outputs comprehensive metrics.
 
 
18
 
 
19
 
20
- # 📊 Model Benchmark Results
 
21
 
22
  - **Prompt Metrics**: These evaluate how well the model classifies or responds to potentially harmful **user inputs**
23
  - **Response Metrics**: These measure how well the model handles or generates **responses**, ensuring its outputs are safe and aligned.
@@ -34,7 +41,8 @@ RoGuard is a lightweight, modular evaluation framework for assessing the safety
34
  | GPT-4o | 68.1 | 70.4 | 83.2 | 90.2 | 87.9 | 83.8 | 67.9 | 73.1 | 83.5 |
35
  | BingoGuard-phi3-3B | 72.5 | 72.8 | 90.0 | 90.8 | 88.9 | 86.2 | 69.9 | 79.7 | 85.1 |
36
  | BingoGuard-llama3.1-8B | 75.7 | 77.9 | 90.4 | 94.9 | 88.9 | 86.4 | 68.7 | 80.1 | 86.4 |
37
- | 🛡️ RoGuard | 75.8 | 70.5 | 91.1 | 90.2 | 88.7 | 87.5 | 69.7 | 80.0 | 80.7 |
 
38
 
39
  ## 🔗 GitHub Repository
40
 
 
11
 
12
  # RoGuard: Advancing Safety for LLMs with Robust Guardrails
13
 
14
+ <div align="center" style="line-height: 1;">
15
+ <a href="https://huggingface.co/Roblox/RoGuard" target="_blank"><img alt="Hugging Face" src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-RoGuard-ffc107?color=ffc107&logoColor=white"/></a>
16
+ <a href="https://huggingface.co/datasets/Roblox/RoGuard-Eval" target="_blank"><img alt="Hugging Face" src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-RoGuardEval-ffc107?color=1783ff&logoColor=white"/></a>
17
+ </div>
18
 
19
+ <p align="center">
20
+ <a href="https://devforum.roblox.com/t/beta-introducing-text-generation-api/3556520" target="_blank"><img src=https://img.shields.io/badge/Roblox-Blog-000000.svg?logo=Roblox height=22px></a>
21
+ </b> &nbsp;&nbsp;&nbsp; | &nbsp;&nbsp;&nbsp; <b>📄&nbsp;&nbsp;Paper Link (coming soon)</b>
22
+ </p>
23
 
24
+ RoGuard, a SOTA instruction fine-tuned LLM, is designed to help safeguard our Text Generation API. It performs safety classification at both the prompt and response levels, deciding whether or not each input or output violates our policies. This dual-level assessment is essential for moderating both user queries and the model’s own generated outputs. At the heart of our system is an LLM that’s been fine-tuned from the Llama-3.1-8B-Instruct model. We trained this LLM with a particular focus on high-quality instruction tuning to optimize for safety judgment performance.
25
 
26
+
27
+ ## 📊 Model Benchmark Results
28
 
29
  - **Prompt Metrics**: These evaluate how well the model classifies or responds to potentially harmful **user inputs**
30
  - **Response Metrics**: These measure how well the model handles or generates **responses**, ensuring its outputs are safe and aligned.
 
41
  | GPT-4o | 68.1 | 70.4 | 83.2 | 90.2 | 87.9 | 83.8 | 67.9 | 73.1 | 83.5 |
42
  | BingoGuard-phi3-3B | 72.5 | 72.8 | 90.0 | 90.8 | 88.9 | 86.2 | 69.9 | 79.7 | 85.1 |
43
  | BingoGuard-llama3.1-8B | 75.7 | 77.9 | 90.4 | 94.9 | 88.9 | 86.4 | 68.7 | 80.1 | 86.4 |
44
+ | RoGuard | 75.8 | 70.5 | 91.1 | 90.2 | 88.7 | 87.5 | 69.7 | 80.0 | 80.7 |
45
+
46
 
47
  ## 🔗 GitHub Repository
48