FlashI2V: Fourier-Guided Latent Shifting Prevents Conditional Image Leakage in Image-to-Video Generation
Paper
•
2509.25187
•
Published
•
2
Comming soon...
| Model | I2V Paradigm | Subject Consistency↑ | Background Consistency↑ | Motion Smoothness↑ | Dynamic Degree↑ | Aesthetic Quality↑ | Imaging Quality↑ | I2V Subject Consistency↑ | I2V Background Consistency↑ |
|---|---|---|---|---|---|---|---|---|---|
| SVD-XT-1.0 (1.5B) | Repeating Concat and Adding Noise | 95.52 | 96.61 | 98.09 | 52.36 | 60.15 | 69.80 | 97.52 | 97.63 |
| SVD-XT-1.1 (1.5B) | Repeating Concat and Adding Noise | 95.42 | 96.77 | 98.12 | 43.17 | 60.23 | 70.23 | 97.51 | 97.62 |
| SEINE-512x512 (1.8B) | Inpainting | 95.28 | 97.12 | 97.12 | 27.07 | 64.55 | 71.39 | 97.15 | 96.94 |
| CogVideoX-5B-I2V | Zero-padding Concat and Adding Noise | 94.34 | 96.42 | 98.40 | 33.17 | 61.87 | 70.01 | 97.19 | 96.74 |
| Wan2.1-I2V-14B-720P | Inpainting | 94.86 | 97.07 | 97.90 | 51.38 | 64.75 | 70.44 | 96.95 | 96.44 |
| CogVideoX1.5-5B-I2V† | Zero-padding Concat and Adding Noise | 95.04 | 96.52 | 98.47 | 37.48 | 62.68 | 70.99 | 97.78 | 98.73 |
| Wan2.1-I2V-14B-480P† | Inpainting | 95.68 | 97.44 | 98.46 | 45.20 | 61.44 | 70.37 | 97.83 | 99.08 |
| FlashI2V† (1.3B) | FlashI2V | 95.13 | 96.36 | 98.35 | 53.01 | 62.34 | 69.41 | 97.67 | 98.72 |
† means testing with recaptioned text-image-pairs in Vbench-I2V.
If you want to cite our work, please follow:
@misc{ge2025flashi2v,
title={FlashI2V: Fourier-Guided Latent Shifting Prevents Conditional Image Leakage in Image-to-Video Generation},
author={Yunyang Ge and Xinhua Cheng and Chengshu Zhao and Xianyi He and Shenghai Yuan and Bin Lin and Bin Zhu and Li Yuan},
year={2025},
eprint={2509.25187},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2509.25187},
}
Totally Free + Zero Barriers + No Login Required