3 2 3

Guangxuan Xiao

Guangxuan-Xiao

http://guangxuanx.com

Guangxuan-Xiao

AI & ML interests

Efficient Machine Learning

Recent Activity

authored a paper 2 days ago

LServe: Efficient Long-sequence LLM Serving with Unified Sparse Attention

authored a paper 4 months ago

DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads

upvoted a paper 4 months ago

DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads

View all activity

Organizations

Guangxuan-Xiao's activity

authored a paper 2 days ago

LServe: Efficient Long-sequence LLM Serving with Unified Sparse Attention

Paper • 2502.14866 • Published 3 days ago • 8

authored a paper 4 months ago

DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads

Paper • 2410.10819 • Published Oct 14, 2024 • 7

upvoted a paper 4 months ago

DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads

Paper • 2410.10819 • Published Oct 14, 2024 • 7

commented a paper 4 months ago

DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads

Paper • 2410.10819 • Published Oct 14, 2024 • 7 •

updated 2 models 5 months ago

mit-han-lab/Llama-3-8B-Instruct-Gradient-4194k-w8a8kv4-per-channel

Updated Oct 9, 2024 • 8

mit-han-lab/Llama-3-8B-Instruct-Gradient-1048k-w8a8kv4-per-channel

Updated Oct 9, 2024 • 10

authored 5 papers 7 months ago

InfLLM: Unveiling the Intrinsic Capacity of LLMs for Understanding Extremely Long Sequences with Training-Free Memory

Paper • 2402.04617 • Published Feb 7, 2024 • 4

updated 2 models 7 months ago

Guangxuan-Xiao/fastcomposer-models

Updated Jul 22, 2024

Guangxuan-Xiao/cat_quatitative_imgs

Updated Jul 20, 2024

updated a dataset 7 months ago

Guangxuan-Xiao/cat_quatitative_imgs

Updated Jul 20, 2024 • 5

updated a model 12 months ago

mit-han-lab/smoothquant-scales

Updated Feb 27, 2024

liked a model 12 months ago

jinaai/jina-colbert-v1-en

Updated Jan 6 • 791 • 99

authored a paper about 1 year ago

BitDelta: Your Fine-Tune May Only Be Worth One Bit

Paper • 2402.10193 • Published Feb 15, 2024 • 22

updated a model about 1 year ago

mit-han-lab/offsite-tuning

Updated Nov 27, 2023

authored 2 papers over 1 year ago

SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models

Paper • 2211.10438 • Published Nov 18, 2022 • 4

Efficient Streaming Language Models with Attention Sinks

Paper • 2309.17453 • Published Sep 29, 2023 • 13