Create README.md
Browse files
README.md
ADDED
@@ -0,0 +1 @@
|
|
|
|
|
1 |
+
Using an early version of our desc2doc-32b model, we created 3 rejections and 3 responses following through for 20 unsafe questions, then averaged the rejections/follow-throughs per-question, subtracted the mean of the rejections from the mean of the follow-throughs, then averaged all of those to produce these vectors which can be compared to llm responses' embeddings to robustly detect rejectjions. Higher similarity to these vectors = higher probability of the embedded sample containing a rejection.
|