File size: 3,730 Bytes
716ac9d
 
cab54a3
716ac9d
cab54a3
59c0871
cab54a3
 
 
 
59c0871
 
716ac9d
 
59c0871
cab54a3
59c0871
 
 
716ac9d
59c0871
716ac9d
cab54a3
 
716ac9d
cab54a3
 
59c0871
 
716ac9d
59c0871
 
cab54a3
 
 
 
716ac9d
cab54a3
 
 
716ac9d
 
 
 
 
 
 
 
 
cab54a3
716ac9d
cab54a3
716ac9d
cab54a3
 
 
 
 
 
 
 
 
 
716ac9d
 
cab54a3
716ac9d
cab54a3
59c0871
cab54a3
 
 
 
 
59c0871
cab54a3
59c0871
cab54a3
 
 
 
 
59c0871
cab54a3
59c0871
cab54a3
 
 
 
 
 
59c0871
cab54a3
716ac9d
cab54a3
 
 
 
 
 
716ac9d
cab54a3
 
 
59c0871
cab54a3
716ac9d
cab54a3
716ac9d
cab54a3
 
 
 
 
716ac9d
 
 
cab54a3
 
 
 
 
 
716ac9d
 
 
59c0871
 
cab54a3
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
# Residual Convolutional Autoencoder for Deepfake Detection

Multi-dataset trained model with **19.18x separation** between real and fake images.

## Model Performance

- **Training Time**: 21.4 minutes on H200 GPU
- **Best Validation Loss**: 0.007970 (Epoch 29)
- **Anomaly Separation**: 19.18x (fake images have 19x higher reconstruction error)
- **Datasets**: CIFAR-10, CIFAR-100, STL-10 (205,000 training images)

## Quick Start
```python
from huggingface_hub import hf_hub_download
import torch
from model import ResidualConvAutoencoder
from torchvision import transforms
from PIL import Image
import json

# Download model and thresholds
checkpoint_path = hf_hub_download(
    repo_id="ash12321/deepfake-autoencoder-cifar10-v2", 
    filename="model_universal_best.ckpt"
)
threshold_path = hf_hub_download(
    repo_id="ash12321/deepfake-autoencoder-cifar10-v2", 
    filename="thresholds_calibrated.json"
)

# Load model
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = ResidualConvAutoencoder(latent_dim=512, dropout=0.1).to(device)
checkpoint = torch.load(checkpoint_path, map_location=device)
model.load_state_dict(checkpoint['model_state_dict'])
model.eval()

# Load thresholds
with open(threshold_path) as f:
    thresholds = json.load(f)

# Prepare image
transform = transforms.Compose([
    transforms.Resize((128, 128)),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5])
])

image = Image.open("your_image.jpg").convert('RGB')
image_tensor = transform(image).unsqueeze(0).to(device)

# Get reconstruction error
with torch.no_grad():
    error = model.reconstruction_error(image_tensor)
    error_value = error.item()
    print(f"Reconstruction error: {error_value:.6f}")

# Check against threshold (balanced mode)
balanced_threshold = thresholds['reconstruction_thresholds']['thresholds']['balanced']['value']
if error_value > balanced_threshold:
    print("⚠️  Potential deepfake detected!")
else:
    print("✅ Image appears authentic")
```

## Detection Thresholds

Three calibrated threshold levels:

| Mode | Threshold | False Positive Rate | Description |
|------|-----------|---------------------|-------------|
| **Strict** | 0.055737 | ~1% | Very low false positives |
| **Balanced** | 0.039442 | ~5% | Recommended for general use |
| **Sensitive** | ~0.039 | ~2.5% | More sensitive detection |

## Model Architecture

- **Encoder**: 5 downsampling blocks (128→64→32→16→8→4)
- **Latent Space**: 512 dimensions
- **Decoder**: 5 upsampling blocks (4→8→16→32→64→128)
- **Residual Blocks**: Skip connections with dropout (0.1)
- **Total Parameters**: ~40M

## Training Details

- **Epochs**: 30 (best at epoch 29)
- **Batch Size**: 1024
- **Optimizer**: AdamW (lr=1e-4, weight_decay=1e-5)
- **Scheduler**: Cosine Annealing with Warm Restarts
- **Data Augmentation**: Horizontal flip, color jitter
- **Mixed Precision**: AMP enabled

## Statistics

### Real Images
- Mean error: 0.018391
- Median error: 0.015647
- Std: 0.010279
- 95th percentile: 0.039442
- 99th percentile: 0.055737

### Fake Images (Synthetic)
- Mean error: 0.352695
- Median error: 0.347151

**Separation Ratio**: 19.18x 🎯

## Files

- `model_universal_best.ckpt` - Full checkpoint (418MB)
- `thresholds_calibrated.json` - Calibrated thresholds
- `model.py` - Model architecture
- `config.json` - Training configuration
- `README.md` - This file

## Citation
```bibtex
@misc{deepfake_autoencoder_2024,
  title={Residual Convolutional Autoencoder for Deepfake Detection},
  author={Your Name},
  year={2024},
  publisher={HuggingFace},
  url={https://huggingface.co/ash12321/deepfake-autoencoder-cifar10-v2}
}
```

## License

MIT License