Towards Training-free Anomaly Detection with Vision and Language Foundation Models (CVPR 2025)

System Requirements

Hardware Requirements:

  • GPU Memory: 32GB VRAM (for running complete experiments)
  • Storage: 70GB free disk space (for models, datasets, and results)
  • CUDA: Compatible GPU with CUDA 12.1 support

Software Requirements:

  • Python 3.10
  • Conda (recommended for environment management)
  • CUDA 12.1 runtime

Note: The memory and storage requirements are for running the full experimental pipeline on all categories with visualization enabled. Smaller experiments on individual categories may require less resources.

Installation

Automated Setup (Recommended)

Run the setup script to automatically configure the complete environment:

bash scripts/setup_environment.sh

This script will:

  • Create a conda environment named logsad with Python 3.10
  • Install PyTorch with CUDA 12.1 support
  • Install all required dependencies from requirements.txt
  • Configure numpy compatibility

Manual Setup

If you prefer manual setup, download the checkpoint for ViT-H SAM model and put it in the checkpoint folder.

After installation, activate the environment:

conda activate logsad

Instructions for MVTEC LOCO dataset

Quick Start (Recommended)

Run evaluation for all categories using the provided shell scripts:

Few-shot Protocol:

bash scripts/run_few_shot.sh

Full-data Protocol:

bash scripts/run_full_data.sh

Manual Execution

Few-shot Protocol

Run the script for few-shot protocal:

python evaluation.py --module_path model_ensemble_few_shot --category CATEGORY  --dataset_path DATASET_PATH

Full-data Protocol

Run the script to compute coreset for full-data scenarios:

python compute_coreset.py --module_path model_ensemble --category CATEGORY  --dataset_path DATASET_PATH

Run the script for full-data protocol:

python evaluation.py --module_path model_ensemble --category CATEGORY  --dataset_path DATASET_PATH

Available categories: breakfast_box, juice_bottle, pushpins, screw_bag, splicing_connectors

Acknowledgement

We are grateful for the following awesome projects when implementing LogSAD:

Citation

If you find our paper is helpful in your research or applications, generously cite with

@inproceedings{zhang2025logsad,
      title={Towards Training-free Anomaly Detection with Vision and Language Foundation Models},
      author={Jinjin Zhang, Guodong Wang, Yizhou Jin, Di Huang},
      year={2025},
      booktitle={IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    }
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support