---
base_model: meta-llama/Meta-Llama-3.1-8B-Instruct
language:
- en
library_name: peft
license: llama3.1
pipeline_tag: text2text-generation
---

# LLaMA-3.1-8B-LoRA-COCO-Deceptive-CLIP Model Card

> 🏆 **This work is accepted to ACL 2025 (Main Conference).**

<p align="left">
    <img src="./main_result.png" alt="main result" width="60%" height="60%">
    <em>Figure: Attack success rate (ASR) and caption diversity of our model on the COCO dataset, illustrating its ability to generate deceptive captions that successfully fool CLIP.</em>
</p>

## Model Description
- **Repository:** [Code](https://github.com/ahnjaewoo/MAC)
- **Paper:** [Can LLMs Deceive CLIP? Benchmarking Adversarial Compositionality of Pre-trained Multimodal Representation via Text Updates](https://arxiv.org/abs/2505.22943)
- **Point of Contact:** [Jaewoo Ahn](mailto:jaewoo.ahn@vision.snu.ac.kr), [Heeseung Yun](mailto:heeseung.yun@vision.snu.ac.kr)

## Model Details
- **Model**: *LLaMA-3.1-8B-LoRA-COCO-Deceptive-CLIP* is a deceptive caption generator built on **LLaMA-3.1-8B**, fine-tuned using LoRA (i.e., *self-training*, or more specifically, *rejection sampling fine-tuning (RFT)*) to deceive **CLIP** on the **COCO** dataset. It achieves an **attack success rate (ASR)** of **42.1%**.\
- **Architecture**: This model is based on [LLaMA-3.1-8B](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct) and utilizes [PEFT](https://github.com/huggingface/peft) v0.12.0 for efficient fine-tuning.

## How to Use
See our GitHub [repository](https://github.com/ahnjaewoo/MAC) for full usage instructions and scripts.