Sealing The Backdoor: Unlearning Adversarial Text Triggers In Diffusion Models Using Knowledge Distillation

This repository contains the code and model weights from the work presented in the paper Sealing The Backdoor: Unlearning Adversarial Text Triggers In Diffusion Models Using Knowledge Distillation.

The paper addresses the vulnerability of text-to-image diffusion models to backdoor attacks, where imperceptible textual triggers can cause manipulated outputs. The proposed approach, Self-Knowledge Distillation with Cross-Attention Guidance (SKD-CAG), selectively erases the model's learned associations between adversarial text triggers and poisoned outputs while preserving overall generation quality. SKD-CAG neutralizes backdoor influences at the attention level using knowledge distillation.

For more details and the implementation, please refer to the GitHub repository.