Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2405.18386

Instruct-MusicGen: Unlocking Text-to-Music Editing for Music Language Models via Instruction Tuning

Paper • 2405.18386 • Published May 28, 2024 • 20
Visual Echoes: A Simple Unified Transformer for Audio-Visual Generation

Paper • 2405.14598 • Published May 23, 2024 • 13

Instruct-MusicGen: Unlocking Text-to-Music Editing for Music Language Models via Instruction Tuning

Paper • 2405.18386 • Published May 28, 2024 • 20

Interesting things.

AtP*: An efficient and scalable method for localizing LLM behaviour to components

Paper • 2403.00745 • Published Mar 1, 2024 • 13
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

Paper • 2402.17764 • Published Feb 27, 2024 • 609
MobiLlama: Towards Accurate and Lightweight Fully Transparent GPT

Paper • 2402.16840 • Published Feb 26, 2024 • 24
LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens

Paper • 2402.13753 • Published Feb 21, 2024 • 116

Direct-a-Video: Customized Video Generation with User-Directed Camera Movement and Object Motion

Paper • 2402.03162 • Published Feb 5, 2024 • 19
InteractiveVideo: User-Centric Controllable Video Generation with Synergistic Multimodal Instructions

Paper • 2402.03040 • Published Feb 5, 2024 • 18
Magic-Me: Identity-Specific Video Customized Diffusion

Paper • 2402.09368 • Published Feb 14, 2024 • 29
LAVE: LLM-Powered Agent Assistance and Language Augmentation for Video Editing

Paper • 2402.10294 • Published Feb 15, 2024 • 25

WorldDreamer: Towards General World Models for Video Generation via Predicting Masked Tokens

Paper • 2401.09985 • Published Jan 18, 2024 • 17
CustomVideo: Customizing Text-to-Video Generation with Multiple Subjects

Paper • 2401.09962 • Published Jan 18, 2024 • 9
Inflation with Diffusion: Efficient Temporal Adaptation for Text-to-Video Super-Resolution

Paper • 2401.10404 • Published Jan 18, 2024 • 10
ActAnywhere: Subject-Aware Video Background Generation

Paper • 2401.10822 • Published Jan 19, 2024 • 13

Retrieval-Augmented Text-to-Audio Generation

Paper • 2309.08051 • Published Sep 14, 2023 • 7
A Large-scale Dataset for Audio-Language Representation Learning

Paper • 2309.11500 • Published Sep 20, 2023 • 10
End-to-End Speech Recognition Contextualization with Large Language Models

Paper • 2309.10917 • Published Sep 19, 2023 • 10
FoleyGen: Visually-Guided Audio Generation

Paper • 2309.10537 • Published Sep 19, 2023 • 9

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs