---
license: creativeml-openrail-m
language:
- en
metrics:
- accuracy
pipeline_tag: image-to-image
---
# Project Chronicle: A Journey into Virtual Try-On with Diffusion Models
This document outlines the development journey of this project, which aims to implement the "TryOnDiffusion: A Tale of Two UNets" paper. It serves as a log of the learning process, implementation steps, challenges faced, and future goals.
## Tech Stack




[](https://huggingface.co/Aditya757864/TRY_ON)
---
## Phase 1: Foundational Learning (The Groundwork)
* **Core Concepts:** Started with the fundamentals of **Computer Vision** and mastered the **PyTorch** framework.
* **Generative Adversarial Networks (GANs):** Implemented and trained a **POKEGAN** to gain practical experience with generative models.
* **Introduction to Diffusion Models:** Shifted focus to diffusion models, successfully training a **Denoising Diffusion Probabilistic Model (DDPM)** on the Fashion MNIST dataset (28x28 images) using an NVIDIA RTX 3090.
* **Data Pipeline Mastery:** Revisited and gained a deeper understanding of PyTorch's `DataLoader` and custom data handling pipelines.
---
## Phase 2: Advanced Concepts & Paper Selection (Scaling Up)
* **Advanced Architectures:** Studied **Transformers** and the **Attention** mechanism to understand how models process long-range dependencies.
* **Modulation Techniques:** Explored specific neural network techniques like **Feature-wise Linear Modulation (FiLM)** for conditioning generative models.
* **Research & Direction:** After a thorough literature review, the **"TryOnDiffusion: A Tale of Two UNets"** paper was selected as the primary research goal for this project.
---
## Phase 3: Implementation, Training, and Debugging (Getting Hands-On)
* **Codebase Adaptation:** Forked and analyzed an open-source implementation by **fashnAI** as a starting point.
* **Custom Development:**
* Engineered a **custom data mapper and `DataLoader`** to process the HR-VITON dataset.
* Wrote a **custom trainer script** tailored to the model's specific needs and for better control over the training loop.
* **Technical Challenges:** Successfully debugged and resolved several breaking changes caused by library updates in the original repository.
* **Model Training:**
* Initiated training on a subset of the **HR-VITON dataset (500 images)**.
* Utilized an **NVIDIA RTX 4090 (24GB)** for the computationally intensive training process.
* Tracked metrics, losses, and logs meticulously using **Weights & Biases (`wandb`)**.
* **Evaluation:** Created a **sampling script** to generate image outputs from checkpoints to qualitatively assess model performance.
---
## Phase 4: The Plateau & The Path Forward (Current Status)
> **Current Challenge:** The model's loss has **stagnated and remains constant**. This suggests the model is no longer learning, likely due to overfitting on the small dataset or a subtle issue in the data pipeline.
### Visual Analysis
*Sample model output after 2000 epochs.*
| Original Input | Input Features | Generated Output |
| ----- | ----- | ----- |
|
|
|
|
*W&B loss curve, clearly illustrating the training plateau.*

* **Immediate Goals:**
1. **Debug the training process:** Perform sanity checks like overfitting on a single batch to verify the model's learning capacity.
2. **Verify the data pipeline:** Thoroughly visualize the inputs (warped clothes, agnostic masks, pose maps) being fed to the model to ensure they are correct.
3. **Investigate Loss Function:** The current loss (e.g., L1 or L2) might not be optimal. Experiment with alternatives like a perceptual loss (LPIPS - Learned Perceptual Image Patch Similarity) to better capture visual similarity.
4. **Tune Hyperparameters:** Experiment with the learning rate and other key hyperparameters.
* **Long-Term Vision:** Resolve the training plateau, scale up the training to a larger dataset, and successfully replicate the results of the TryOnDiffusion paper.