You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

Project Chronicle: A Journey into Virtual Try-On with Diffusion Models

This document outlines the development journey of this project, which aims to implement the "TryOnDiffusion: A Tale of Two UNets" paper. It serves as a log of the learning process, implementation steps, challenges faced, and future goals.

Tech Stack

PyTorch Transformers Weights & Biases Python Hugging Face


Phase 1: Foundational Learning (The Groundwork)

  • Core Concepts: Started with the fundamentals of Computer Vision and mastered the PyTorch framework.
  • Generative Adversarial Networks (GANs): Implemented and trained a POKEGAN to gain practical experience with generative models.
  • Introduction to Diffusion Models: Shifted focus to diffusion models, successfully training a Denoising Diffusion Probabilistic Model (DDPM) on the Fashion MNIST dataset (28x28 images) using an NVIDIA RTX 3090.
  • Data Pipeline Mastery: Revisited and gained a deeper understanding of PyTorch's DataLoader and custom data handling pipelines.

Phase 2: Advanced Concepts & Paper Selection (Scaling Up)

  • Advanced Architectures: Studied Transformers and the Attention mechanism to understand how models process long-range dependencies.
  • Modulation Techniques: Explored specific neural network techniques like Feature-wise Linear Modulation (FiLM) for conditioning generative models.
  • Research & Direction: After a thorough literature review, the "TryOnDiffusion: A Tale of Two UNets" paper was selected as the primary research goal for this project.

Phase 3: Implementation, Training, and Debugging (Getting Hands-On)

  • Codebase Adaptation: Forked and analyzed an open-source implementation by fashnAI as a starting point.
  • Custom Development:
    • Engineered a custom data mapper and DataLoader to process the HR-VITON dataset.
    • Wrote a custom trainer script tailored to the model's specific needs and for better control over the training loop.
  • Technical Challenges: Successfully debugged and resolved several breaking changes caused by library updates in the original repository.
  • Model Training:
    • Initiated training on a subset of the HR-VITON dataset (500 images).
    • Utilized an NVIDIA RTX 4090 (24GB) for the computationally intensive training process.
    • Tracked metrics, losses, and logs meticulously using Weights & Biases (wandb).
  • Evaluation: Created a sampling script to generate image outputs from checkpoints to qualitatively assess model performance.

Phase 4: The Plateau & The Path Forward (Current Status)

Current Challenge: The model's loss has stagnated and remains constant. This suggests the model is no longer learning, likely due to overfitting on the small dataset or a subtle issue in the data pipeline.

Visual Analysis

Sample model output after 2000 epochs.

Original Input Input Features Generated Output
Original Input Image Input Features Image Generated Output Image

W&B loss curve, clearly illustrating the training plateau. Wandb loss curve showing a flat line

  • Immediate Goals:
    1. Debug the training process: Perform sanity checks like overfitting on a single batch to verify the model's learning capacity.
    2. Verify the data pipeline: Thoroughly visualize the inputs (warped clothes, agnostic masks, pose maps) being fed to the model to ensure they are correct.
    3. Investigate Loss Function: The current loss (e.g., L1 or L2) might not be optimal. Experiment with alternatives like a perceptual loss (LPIPS - Learned Perceptual Image Patch Similarity) to better capture visual similarity.
    4. Tune Hyperparameters: Experiment with the learning rate and other key hyperparameters.
  • Long-Term Vision: Resolve the training plateau, scale up the training to a larger dataset, and successfully replicate the results of the TryOnDiffusion paper.
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support