zhiliang's picture

zhiliang

zzliang

·

pengzhiliang

AI & ML interests

multimodal

Recent Activity

authored a paper 6 days ago

Generic-to-Specific Distillation of Masked Autoencoders

authored a paper 6 days ago

Kosmos-G: Generating Images in Context with Multimodal Large Language Models

authored a paper 6 days ago

Integrally Migrating Pre-trained Transformer Encoder-decoders for Visual Object Detection

View all activity

Organizations

None yet

authored 9 papers 6 days ago

Generic-to-Specific Distillation of Masked Autoencoders

Paper • 2302.14771 • Published Feb 28, 2023

Kosmos-G: Generating Images in Context with Multimodal Large Language Models

Paper • 2310.02992 • Published Oct 4, 2023 • 4

Integrally Migrating Pre-trained Transformer Encoder-decoders for Visual Object Detection

Paper • 2205.09613 • Published May 19, 2022

BEiT v2: Masked Image Modeling with Vector-Quantized Visual Tokenizers

Paper • 2208.06366 • Published Aug 12, 2022

Foundation Transformers

Paper • 2210.06423 • Published Oct 12, 2022

A Unified View of Masked Image Modeling

Paper • 2210.10615 • Published Oct 19, 2022

Image as a Foreign Language: BEiT Pretraining for All Vision and Vision-Language Tasks

Paper • 2208.10442 • Published Aug 22, 2022

Multimodal Latent Language Modeling with Next-Token Diffusion

Paper • 2412.08635 • Published Dec 11, 2024 • 49

VibeVoice Technical Report

Paper • 2508.19205 • Published 12 days ago • 120

authored a paper about 2 years ago

Kosmos-2: Grounding Multimodal Large Language Models to the World

Paper • 2306.14824 • Published Jun 26, 2023 • 34