Towards Training-free Anomaly Detection with Vision and Language Foundation Models (CVPR 2025)

System Requirements

Hardware Requirements:

GPU Memory: 32GB VRAM (for running complete experiments)
Storage: 70GB free disk space (for models, datasets, and results)
CUDA: Compatible GPU with CUDA 12.1 support

Software Requirements:

Python 3.10
Conda (recommended for environment management)
CUDA 12.1 runtime

Note: The memory and storage requirements are for running the full experimental pipeline on all categories with visualization enabled. Smaller experiments on individual categories may require less resources.

Installation

Automated Setup (Recommended)

Run the setup script to automatically configure the complete environment:

bash scripts/setup_environment.sh

This script will:

Create a conda environment named logsad with Python 3.10
Install PyTorch with CUDA 12.1 support
Install all required dependencies from requirements.txt
Configure numpy compatibility

Manual Setup

If you prefer manual setup, download the checkpoint for ViT-H SAM model and put it in the checkpoint folder.

After installation, activate the environment:

conda activate logsad

Instructions for MVTEC LOCO dataset

Quick Start (Recommended)

Run evaluation for all categories using the provided shell scripts:

Few-shot Protocol:

bash scripts/run_few_shot.sh

Full-data Protocol:

bash scripts/run_full_data.sh

Manual Execution

Few-shot Protocol

Run the script for few-shot protocal:

python evaluation.py --module_path model_ensemble_few_shot --category CATEGORY  --dataset_path DATASET_PATH

Full-data Protocol

Run the script to compute coreset for full-data scenarios:

python compute_coreset.py --module_path model_ensemble --category CATEGORY  --dataset_path DATASET_PATH

Run the script for full-data protocol:

python evaluation.py --module_path model_ensemble --category CATEGORY  --dataset_path DATASET_PATH

Available categories: breakfast_box, juice_bottle, pushpins, screw_bag, splicing_connectors

Acknowledgement

We are grateful for the following awesome projects when implementing LogSAD:

SAM, OpenCLIP, DINOv2 and NACLIP.

Citation

If you find our paper is helpful in your research or applications, generously cite with

@inproceedings{zhang2025logsad,
      title={Towards Training-free Anomaly Detection with Vision and Language Foundation Models},
      author={Jinjin Zhang, Guodong Wang, Yizhou Jin, Di Huang},
      year={2025},
      booktitle={IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    }