license: apache-2.0
language:
- en
base_model:
- madebyollin/sdxl-vae-fp16-fix
- stabilityai/sdxl-vae
library_name: diffusers
SDXL-VAE finetuned
Imagenet eval (256px)
Model | MSE | PSNR | LPIPS |
---|---|---|---|
madebyollin/sdxl-vae-fp16-fix | 3.680e-03 | 25.2100 | 0.1314 |
KBlueLeaf/EQ-SDXL-VAE | 3.530e-03 | 25.2827 | 0.1298 |
AiArtLab/sdxl_vae | 3.321e-03 | 25.6389 | 0.1251 |
Alchemist eval (512px)
Model | MSE | PSNR | LPIPS |
---|---|---|---|
madebyollin/sdxl-vae-fp16 | 100% | 100% | 100% |
KBlueLeaf/EQ-SDXL-VAE | 107.8% | 100.1% | 95.5% |
AiArtLab/sdxl_vae | 112.3% | 101.8% | 106.6% |
FLUX.1-schnell-vae | 324.0% | 119.8% | 292.0% |
Diffusers
from diffusers import AutoencoderKL
vae = AutoencoderKL.from_pretrained("AiArtLab/sdxl_vae").cuda().half()
Train status, in progress:
We are currently testing the possibility of improving the SDXL VAE decoder by increasing its depth (asymmetric VAE). This will lead to a slight increase in model size (approximately 20 percent), but we expect this will improve reconstruction quality without modifying the encoder (does not require retraining SDXL). Unfortunately, our resources are quite limited (we train models on consumer GPUs, currently training three models: SDXL VAE, Simple Diffusion, and Simple VAE), so please be patient. Model training is a meticulous and time-consuming process.
VAE Training Process
- Encoder: Frozen (to avoid retraining SDXL for the new VAE).
- Dataset: 100,000 PNG images
- Training Time: 4 days
- Hardware: Single RTX 4090
- Resolution: 512px
- Precision: FP32
- Effective Batch Size: 16 (batch size 2 + gradient accumulation 8)
- Optimizer: AdamW (8-bit)
Implementation
- Base Code: Used a simple diffusion model training script.
- Training Target: Only the decoder, focusing on image reconstruction.
Loss Functions
- Initially used LPIPS and MSE.
- Noticed FID score improving, but images becoming blurry (FID overfits to blurry images—improving FID is not always good).
- Switched to MAE.
- Balanced LPIPS and MAE at 90/10 ratio.
- Used median perceptual_loss_weight for better balance.
Compare
https://imgsli.com/NDA3Njgw/2/3
Donations
Please contact with us if you may provide some GPU's or money on training
DOGE: DEw2DR8C7BnF8GgcrfTzUjSnGkuMeJhg83
BTC: 3JHv9Hb8kEW8zMAccdgCdZGfrHeMhH1rpN