Find and segment objects in images using SAM and Grounding DINO
Generate images from intermediate text representations