Generate depth maps from images
Generate realistic audio from text
Engage in multimedia chat with LLMs and ML models