Sealing The Backdoor: Unlearning Adversarial Text Triggers In Diffusion Models Using Knowledge Distillation

This repository contains the code and model weights from the work presented in the paper Sealing The Backdoor: Unlearning Adversarial Text Triggers In Diffusion Models Using Knowledge Distillation.

The paper addresses the vulnerability of text-to-image diffusion models to backdoor attacks, where imperceptible textual triggers can cause manipulated outputs. The proposed approach, Self-Knowledge Distillation with Cross-Attention Guidance (SKD-CAG), selectively erases the model's learned associations between adversarial text triggers and poisoned outputs while preserving overall generation quality. SKD-CAG neutralizes backdoor influences at the attention level using knowledge distillation.

For more details and the implementation, please refer to the GitHub repository.

Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support