Anime Stable Diffusion Model

A custom Stable Diffusion model fine-tuned for anime-style image generation, trained on a large dataset of anime images. This is the first concept model for the entire series as I am spending more time filtering and processing the larger dataset. Currently the model is still undertrained, while it can reflect certain notions, a lot of additional improvements need to be done.

Prompt

Danbooru style tagging.

Quality tag: Masterpiece, high quality, normal quality, low quality Aesthetic tag: Very aesthetic, aesthetic, pleasent, unpleasent

Additional special tag: High resolution, elegant, artist:

Rating Modifier	Rating Criterion
-	general
-	sensitive
nsfw	questionable
nsfw	explicit

Recommanded prompt order: Rating tag, quality tag, aesthetic tag, (additional tag), general tag

Dataset Specifications

Total Images: 172k
General Training Set: 160k images
Aesthetic Fine-tuning Set: 12k high-quality images
Resolution: 1024x1024

Hardware Configuration

GPUs: 2x NVIDIA RTX 6000 Ada
Training Time: 16 days (General), 3 days (Aesthetic fine tune)

Training Configuration

Parameter	Value	Description
Resolution	1024x1024	Training resolution
Batch Size	8x2x2	Effective batch size
Learning Rate	5e-5	Base learning rate
Text Encoder LR	1e-5	Learning rate for text encoder
Epochs	10	Total training epochs
Mixed Precision	FP16	Training precision mode
Optimizer	AdamW8bit	Optimizer type

Advanced Settings

Feature	Setting	Purpose
Gradient Checkpointing	Enabled	Memory optimization
XFormers	Enabled	Attention optimization
Memory Efficient Attention	Enabled	Memory optimization
Bucket Resolution Steps	128	Dynamic resolution handling
Min Bucket Resolution	512	Minimum image size
Max Bucket Resolution	4096	Maximum image size
Noise Offset	0.035	Training stability
Min SNR Gamma	5	Signal-to-noise ratio control