metadata

license: apache-2.0
language:
  - en
base_model:
  - madebyollin/sdxl-vae-fp16-fix
  - stabilityai/sdxl-vae
library_name: diffusers

SDXL-VAE finetuned

Imagenet eval (256px)

Model	MSE	PSNR	LPIPS
madebyollin/sdxl-vae-fp16-fix	3.680e-03	25.2100	0.1314
KBlueLeaf/EQ-SDXL-VAE	3.530e-03	25.2827	0.1298
AiArtLab/sdxl_vae	3.321e-03	25.6389	0.1251

Alchemist eval (512px)

Model	MSE	PSNR	LPIPS
madebyollin/sdxl-vae-fp16	100%	100%	100%
KBlueLeaf/EQ-SDXL-VAE	107.8%	100.1%	95.5%
AiArtLab/sdxl_vae	112.3%	101.8%	106.6%
FLUX.1-schnell-vae	324.0%	119.8%	292.0%

Diffusers

from diffusers import AutoencoderKL

vae = AutoencoderKL.from_pretrained("AiArtLab/sdxl_vae").cuda().half()

Train status, in progress:

We are currently testing the possibility of improving the SDXL VAE decoder by increasing its depth (asymmetric VAE). This will lead to a slight increase in model size (approximately 20 percent), but we expect this will improve reconstruction quality without modifying the encoder (does not require retraining SDXL). Unfortunately, our resources are quite limited (we train models on consumer GPUs, currently training three models: SDXL VAE, Simple Diffusion, and Simple VAE), so please be patient. Model training is a meticulous and time-consuming process.

VAE Training Process

Encoder: Frozen (to avoid retraining SDXL for the new VAE).
Dataset: 100,000 PNG images
Training Time: 4 days
Hardware: Single RTX 4090
Resolution: 512px
Precision: FP32
Effective Batch Size: 16 (batch size 2 + gradient accumulation 8)
Optimizer: AdamW (8-bit)

Implementation

Base Code: Used a simple diffusion model training script.
Training Target: Only the decoder, focusing on image reconstruction.

Loss Functions

Initially used LPIPS and MSE.
Noticed FID score improving, but images becoming blurry (FID overfits to blurry images—improving FID is not always good).
Switched to MAE.
Balanced LPIPS and MAE at 90/10 ratio.
Used median perceptual_loss_weight for better balance.

Compare

https://imgsli.com/NDA3OTgz

https://imgsli.com/NDA3Njgw/2/3

Donations

Please contact with us if you may provide some GPU's or money on training

DOGE: DEw2DR8C7BnF8GgcrfTzUjSnGkuMeJhg83

BTC: 3JHv9Hb8kEW8zMAccdgCdZGfrHeMhH1rpN

Contacts

recoilme