ahnpersie
/

llama3.1-8b-lora-coco-deceptive-clip

Text Generation

text2text-generation

Model card Files Files and versions Community

Fix pipeline tag, add project page

#1

by nielsr HF Staff - opened Jun 2

base: refs/heads/main

←

from: refs/pr/1

Discussion Files changed

Files changed (1) hide show

README.md +6 -5

README.md CHANGED Viewed

@@ -1,10 +1,10 @@
 ---
 base_model: meta-llama/Meta-Llama-3.1-8B-Instruct
-library_name: peft
-license: llama3.1
 language:
 - en
-pipeline_tag: text2text-generation
 ---
 # LLaMA-3.1-8B-LoRA-COCO-Deceptive-CLIP Model Card
@@ -14,11 +14,12 @@ pipeline_tag: text2text-generation
 ## Model Description
 - **Repository:** [Code](https://github.com/ahnjaewoo/MAC)
 - **Paper:** [Can LLMs Deceive CLIP? Benchmarking Adversarial Compositionality of Pre-trained Multimodal Representation via Text Updates](https://arxiv.org/abs/2505.22943)
 - **Point of Contact:** [Jaewoo Ahn](mailto:[email protected]), [Heeseung Yun](mailto:[email protected])
 ## Model Details
-- **Model**: *LLaMA-3.1-8B-LoRA-COCO-Deceptive-CLIP* is a deceptive caption generator built on **LLaMA-3.1-8B**, fine-tuned using LoRA (i.e., *self-training*, or more specifically, *rejection sampling fine-tuning (RFT)*) to deceive **CLIP** on the **COCO** dataset. It achieves an **attack success rate (ASR)** of **42.1%**.
 - **Architecture**: This model is based on [LLaMA-3.1-8B](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct) and utilizes [PEFT](https://github.com/huggingface/peft) v0.12.0 for efficient fine-tuning.
 ## How to Use
-See our GitHub [repository](https://github.com/ahnjaewoo/MAC) for full usage instructions and scripts.

 ---
 base_model: meta-llama/Meta-Llama-3.1-8B-Instruct
 language:
 - en
+library_name: peft
+license: llama3.1
+pipeline_tag: text-generation
 ---
 # LLaMA-3.1-8B-LoRA-COCO-Deceptive-CLIP Model Card
 ## Model Description
 - **Repository:** [Code](https://github.com/ahnjaewoo/MAC)
 - **Paper:** [Can LLMs Deceive CLIP? Benchmarking Adversarial Compositionality of Pre-trained Multimodal Representation via Text Updates](https://arxiv.org/abs/2505.22943)
+- **Project Page:** [Project Page](https://vision.snu.ac.kr/projects/mac)
 - **Point of Contact:** [Jaewoo Ahn](mailto:[email protected]), [Heeseung Yun](mailto:[email protected])
 ## Model Details
+- **Model**: *LLaMA-3.1-8B-LoRA-COCO-Deceptive-CLIP* is a deceptive caption generator built on **LLaMA-3.1-8B**, fine-tuned using LoRA (i.e., *self-training*, or more specifically, *rejection sampling fine-tuning (RFT)*) to deceive **CLIP** on the **COCO** dataset. It achieves an **attack success rate (ASR)** of **42.1%**.\
 - **Architecture**: This model is based on [LLaMA-3.1-8B](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct) and utilizes [PEFT](https://github.com/huggingface/peft) v0.12.0 for efficient fine-tuning.
 ## How to Use
+See our GitHub [repository](https://github.com/ahnjaewoo/MAC) for full usage instructions and scripts.