new

Get trending papers in your email inbox once a day!

Get trending papers in your email inbox!

Daily Papers

by AK and the research community

Dec 15

Submitted by

akhaliq

StemGen: A music generation model that listens

·
9 authors

Submitted by

akhaliq

TinyGSM: achieving >80% on GSM8k with small language models

·
8 authors

Submitted by

akhaliq

CogAgent: A Visual Language Model for GUI Agents

·
11 authors

Submitted by

akhaliq

VideoLCM: Video Latent Consistency Model

·
7 authors

Submitted by

akhaliq

A Picture is Worth More Than 77 Text Tokens: Evaluating CLIP-Style Models on Dense Captions

·
6 authors

Submitted by

akhaliq

Mosaic-SDF for 3D Generative Models

·
5 authors

Submitted by

akhaliq

Pixel Aligned Language Models

·
8 authors

Submitted by

akhaliq

SEEAvatar: Photorealistic Text-to-3D Avatar Generation with Constrained Geometry and Appearance

·
3 authors

Submitted by

akhaliq

Zebra: Extending Context Window with Layerwise Grouped Local-Global Attention

·
5 authors

Submitted by

akhaliq

Vision-Language Models as a Source of Rewards

·
26 authors

Submitted by

akhaliq

FineControlNet: Fine-level Text Control for Image Generation with Spatially Aligned Text Control Injection

·
6 authors

Submitted by

akhaliq

LIME: Localized Image Editing via Attention Regularization in Diffusion Models

·
5 authors

Submitted by

akhaliq

General Object Foundation Model for Images and Videos at Scale

·
6 authors

Submitted by

akhaliq

ZeroQuant(4+2): Redefining LLMs Quantization with a New FP6-Centric Strategy for Diverse Generative Tasks

·
11 authors

Submitted by

akhaliq

UniDream: Unifying Diffusion Priors for Relightable Text-to-3D Generation

·
10 authors

Submitted by

akhaliq

Helping or Herding? Reward Model Ensembles Mitigate but do not Eliminate Reward Hacking

·
12 authors

Submitted by

akhaliq

VL-GPT: A Generative Pre-trained Transformer for Vision and Language Understanding and Generation

·
8 authors

Submitted by

akhaliq

SHAP-EDITOR: Instruction-guided Latent 3D Editing in Seconds

·
4 authors

Submitted by

akhaliq

TigerBot: An Open Multilingual Multitask LLM

·
6 authors