Commit
·
c71cdb2
1
Parent(s):
03f4f17
Update README.md
Browse files
README.md
CHANGED
|
@@ -140,12 +140,12 @@ ability of the model to generate content with non-English prompts is significant
|
|
| 140 |
|
| 141 |
## Training
|
| 142 |
|
| 143 |
-
|
| 144 |
The model developers used the following dataset for training the model:
|
| 145 |
|
| 146 |
- LAION-2B (en) and subsets thereof (see next section)
|
| 147 |
|
| 148 |
-
|
| 149 |
Stable Diffusion v1-4 is a latent diffusion model which combines an autoencoder with a diffusion model that is trained in the latent space of the autoencoder. During training,
|
| 150 |
|
| 151 |
- Images are encoded through an encoder, which turns images into latent representations. The autoencoder uses a relative downsampling factor of 8 and maps images of shape H x W x 3 to latents of shape H/f x W/f x 4
|
|
@@ -162,6 +162,8 @@ filtered to images with an original size `>= 512x512`, estimated aesthetics scor
|
|
| 162 |
- [`stable-diffusion-v1-3`](https://huggingface.co/CompVis/stable-diffusion-v1-3): Resumed from `stable-diffusion-v1-2`. 195,000 steps at resolution `512x512` on "laion-improved-aesthetics" and 10 % dropping of the text-conditioning to improve [classifier-free guidance sampling](https://arxiv.org/abs/2207.12598)
|
| 163 |
- [**`stable-diffusion-v1-4`**](https://huggingface.co/CompVis/stable-diffusion-v1-4) *To-fill-here*
|
| 164 |
|
|
|
|
|
|
|
| 165 |
- **Hardware:** 32 x 8 x A100 GPUs
|
| 166 |
- **Optimizer:** AdamW
|
| 167 |
- **Gradient Accumulations**: 2
|
|
|
|
| 140 |
|
| 141 |
## Training
|
| 142 |
|
| 143 |
+
### Training Data
|
| 144 |
The model developers used the following dataset for training the model:
|
| 145 |
|
| 146 |
- LAION-2B (en) and subsets thereof (see next section)
|
| 147 |
|
| 148 |
+
### Training Procedure
|
| 149 |
Stable Diffusion v1-4 is a latent diffusion model which combines an autoencoder with a diffusion model that is trained in the latent space of the autoencoder. During training,
|
| 150 |
|
| 151 |
- Images are encoded through an encoder, which turns images into latent representations. The autoencoder uses a relative downsampling factor of 8 and maps images of shape H x W x 3 to latents of shape H/f x W/f x 4
|
|
|
|
| 162 |
- [`stable-diffusion-v1-3`](https://huggingface.co/CompVis/stable-diffusion-v1-3): Resumed from `stable-diffusion-v1-2`. 195,000 steps at resolution `512x512` on "laion-improved-aesthetics" and 10 % dropping of the text-conditioning to improve [classifier-free guidance sampling](https://arxiv.org/abs/2207.12598)
|
| 163 |
- [**`stable-diffusion-v1-4`**](https://huggingface.co/CompVis/stable-diffusion-v1-4) *To-fill-here*
|
| 164 |
|
| 165 |
+
### Training details
|
| 166 |
+
|
| 167 |
- **Hardware:** 32 x 8 x A100 GPUs
|
| 168 |
- **Optimizer:** AdamW
|
| 169 |
- **Gradient Accumulations**: 2
|