False Sense of Security: Why Probing-based Malicious Input Detection Fails to Generalize Paper • 2509.03888 • Published 3 days ago • 1
False Sense of Security: Why Probing-based Malicious Input Detection Fails to Generalize Paper • 2509.03888 • Published 3 days ago • 1 • 3
Jailbreak and Guard Aligned Language Models with Only Few In-Context Demonstrations Paper • 2310.06387 • Published Oct 10, 2023