WildTeaming at Scale: From In-the-Wild Jailbreaks to (Adversarially) Safer Language Models Paper • 2406.18510 • Published Jun 26, 2024 • 10
DemonAgent: Dynamically Encrypted Multi-Backdoor Implantation Attack on LLM-based Agent Paper • 2502.12575 • Published Feb 18, 2025 • 2