AutoWebGLM: Bootstrap And Reinforce A Large Language Model-based Web Navigating Agent Paper • 2404.03648 • Published Apr 4, 2024 • 29 • 7
SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis Paper • 2307.01952 • Published Jul 4, 2023 • 87 • 9
Ichigo: Mixed-Modal Early-Fusion Realtime Voice Assistant Paper • 2410.15316 • Published Oct 20, 2024 • 12 • 5
DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation Paper • 2208.12242 • Published Aug 25, 2022 • 12 • 12
Learning Flow Fields in Attention for Controllable Person Image Generation Paper • 2412.08486 • Published Dec 11, 2024 • 37 • 6
Learning Flow Fields in Attention for Controllable Person Image Generation Paper • 2412.08486 • Published Dec 11, 2024 • 37 • 6
Let's Go Shopping (LGS) -- Web-Scale Image-Text Dataset for Visual Concept Understanding Paper • 2401.04575 • Published Jan 9, 2024 • 17 • 4
WorldCuisines: A Massive-Scale Benchmark for Multilingual and Multicultural Visual Question Answering on Global Cuisines Paper • 2410.12705 • Published Oct 16, 2024 • 33 • 3
Guiding a Diffusion Model with a Bad Version of Itself Paper • 2406.02507 • Published Jun 4, 2024 • 17 • 1