Spider-Sense: Intrinsic Risk Sensing for Efficient Agent Defense with Hierarchical Adaptive Screening Paper • 2602.05386 • Published 6 days ago • 69
THINKSAFE: Self-Generated Safety Alignment for Reasoning Models Paper • 2601.23143 • Published 12 days ago • 38
SafeGRPO: Self-Rewarded Multimodal Safety Alignment via Rule-Governed Policy Optimization Paper • 2511.12982 • Published Nov 17, 2025 • 4
SafeGRPO: Self-Rewarded Multimodal Safety Alignment via Rule-Governed Policy Optimization Paper • 2511.12982 • Published Nov 17, 2025 • 4
SafeGRPO: Self-Rewarded Multimodal Safety Alignment via Rule-Governed Policy Optimization Paper • 2511.12982 • Published Nov 17, 2025 • 4 • 2
Backdoor Cleaning without External Guidance in MLLM Fine-tuning Paper • 2505.16916 • Published May 22, 2025 • 17
Keeping Yourself is Important in Downstream Tuning Multimodal Large Language Model Paper • 2503.04543 • Published Mar 6, 2025 • 1
MultiVerse: A Multi-Turn Conversation Benchmark for Evaluating Large Vision and Language Models Paper • 2510.16641 • Published Oct 18, 2025 • 5