crumb commited on
Commit
8ecbcf8
·
verified ·
1 Parent(s): e38da0f

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -0
README.md ADDED
@@ -0,0 +1 @@
 
 
1
+ Using an early version of our desc2doc-32b model, we created 3 rejections and 3 responses following through for 20 unsafe questions, then averaged the rejections/follow-throughs per-question, subtracted the mean of the rejections from the mean of the follow-throughs, then averaged all of those to produce these vectors which can be compared to llm responses' embeddings to robustly detect rejectjions. Higher similarity to these vectors = higher probability of the embedded sample containing a rejection.