Fast 8 step inference of Qwen Image Edit
generate a video from an image with a text prompt
Separate audio into drums, bass, other, and vocals