---
license: creativeml-openrail-m
language:
- en
metrics:
- accuracy
pipeline_tag: image-to-image
---
# Project Chronicle: A Journey into Virtual Try-On with Diffusion Models

This document outlines the development journey of this project, which aims to implement the "TryOnDiffusion: A Tale of Two UNets" paper. It serves as a log of the learning process, implementation steps, challenges faced, and future goals.

## Tech Stack

![PyTorch](https://img.shields.io/badge/PyTorch-%23EE4C2C.svg?style=for-the-badge&logo=pytorch&logoColor=white)
![Transformers](https://img.shields.io/badge/🤗%20Transformers-yellow?style=for-the-badge)
![Weights & Biases](https://img.shields.io/badge/Weights%26_Biases-FFBE00?style=for-the-badge&logo=WeightsAndBiases&logoColor=black)
![Python](https://img.shields.io/badge/Python-3776AB?style=for-the-badge&logo=python&logoColor=white)
[![Hugging Face](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Model-yellow)](https://huggingface.co/Aditya757864/TRY_ON)

---

## Phase 1: Foundational Learning (The Groundwork)

* **Core Concepts:** Started with the fundamentals of **Computer Vision** and mastered the **PyTorch** framework.
* **Generative Adversarial Networks (GANs):** Implemented and trained a **POKEGAN** to gain practical experience with generative models.
* **Introduction to Diffusion Models:** Shifted focus to diffusion models, successfully training a **Denoising Diffusion Probabilistic Model (DDPM)** on the Fashion MNIST dataset (28x28 images) using an NVIDIA RTX 3090.
* **Data Pipeline Mastery:** Revisited and gained a deeper understanding of PyTorch's `DataLoader` and custom data handling pipelines.

---

## Phase 2: Advanced Concepts & Paper Selection (Scaling Up)

* **Advanced Architectures:** Studied **Transformers** and the **Attention** mechanism to understand how models process long-range dependencies.
* **Modulation Techniques:** Explored specific neural network techniques like **Feature-wise Linear Modulation (FiLM)** for conditioning generative models.
* **Research & Direction:** After a thorough literature review, the **"TryOnDiffusion: A Tale of Two UNets"** paper was selected as the primary research goal for this project.

---

## Phase 3: Implementation, Training, and Debugging (Getting Hands-On)

* **Codebase Adaptation:** Forked and analyzed an open-source implementation by **fashnAI** as a starting point.
* **Custom Development:**
    * Engineered a **custom data mapper and `DataLoader`** to process the HR-VITON dataset.
    * Wrote a **custom trainer script** tailored to the model's specific needs and for better control over the training loop.
* **Technical Challenges:** Successfully debugged and resolved several breaking changes caused by library updates in the original repository.
* **Model Training:**
    * Initiated training on a subset of the **HR-VITON dataset (500 images)**.
    * Utilized an **NVIDIA RTX 4090 (24GB)** for the computationally intensive training process.
    * Tracked metrics, losses, and logs meticulously using **Weights & Biases (`wandb`)**.
* **Evaluation:** Created a **sampling script** to generate image outputs from checkpoints to qualitatively assess model performance.

---

## Phase 4: The Plateau & The Path Forward (Current Status)

> **Current Challenge:** The model's loss has **stagnated and remains constant**. This suggests the model is no longer learning, likely due to overfitting on the small dataset or a subtle issue in the data pipeline.

### Visual Analysis

*Sample model output after 2000 epochs.*
| Original Input | Input Features | Generated Output | 
 | ----- | ----- | ----- | 
| <img src="./original.png" alt="Original Input Image" width="300"> | <img src="./imputs.png" alt="Input Features Image" width="80"> | <img src="./our_output.png" alt="Generated Output Image" > | 


*W&B loss curve, clearly illustrating the training plateau.*
![Wandb loss curve showing a flat line](./assets/wandb.png)

* **Immediate Goals:**
    1.  **Debug the training process:** Perform sanity checks like overfitting on a single batch to verify the model's learning capacity.
    2.  **Verify the data pipeline:** Thoroughly visualize the inputs (warped clothes, agnostic masks, pose maps) being fed to the model to ensure they are correct.
    3. **Investigate Loss Function:** The current loss (e.g., L1 or L2) might not be optimal. Experiment with alternatives like a perceptual loss (LPIPS - Learned Perceptual Image Patch Similarity) to better capture visual similarity.
    4.  **Tune Hyperparameters:** Experiment with the learning rate and other key hyperparameters.
* **Long-Term Vision:** Resolve the training plateau, scale up the training to a larger dataset, and successfully replicate the results of the TryOnDiffusion paper.