An Embarrassingly Simple Defense Against LLM Abliteration Attacks Paper • 2505.19056 • Published May 25 • 6
Turning the Spell Around: Lightweight Alignment Amplification via Rank-One Safety Injection Paper • 2508.20766 • Published 11 days ago • 14