Speed Issue

#8
by yashlanjewar20 - opened

Why this is extremely slow compared with original Hunyuan t2v / leap fusion Image2video workflow even with sage-attn and compile. Also Teacaching performs poorly as well

They de-distilled the model, so we have to use CFG for the inference, which means almost twice the compute.

I optimised Kijai's workflow for my system and posted it to CivitAI, It utilises Comfyui_MultiGPU UnetLoaderGGUFDisTorchMultiGPU

@tsolful any way to make this work faster on single 4090, and what inference time were you able to achieve

@yashlanjewar20
Through my workflow on 306012gb+32gbram for 73 frames 1st load
Prompt executed in 1662.22 seconds -587.365 seconds for upscale = 1075 seconds
640x864
73 frames
Steps: 6-12 (Stage 1 6 steps + Stage 2 6 steps)
cfg: 4.0
Sampler: Euler
Scheduler: Simple
after 1st load base gen runtimes(2Stage+Vae Decode):
758.173 seconds
704.589 seconds

with suggested lora after 1st:
779.494

169F tests after 1st (No Load Test):
OOM

121F test after 1st+6stepLORA+smoothLORA (No Load Test):
1st stage
525.14s 1st iteration
729.66s 2nd
736.19s 3rd
645.15s 4th
665.55s 5th
764.12s 6th/Average
2nd stage
81.90s 1st+2nd iteration
OOM
Instant requeue after oom runs from 2nd stage
6.17s 1st Iteration
113.74s 2nd+3rd
222.92s 4th
327.62s 5th
282.29s 6th/Average
VAE 128.309s
ill be updating more details on my civit post as i use it more

Sign up or log in to comment