zsw1129's picture

12 2

zsw1129

zsw1129

·

AI & ML interests

None yet

Recent Activity

upvoted a paper 12 days ago

Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling

liked a model 13 days ago

agentica-org/DeepScaleR-1.5B-Preview

upvoted an article 18 days ago

SmolVLM Grows Smaller – Introducing the 250M & 500M Models!

View all activity

Organizations

None yet

zsw1129's activity

upvoted a paper 12 days ago

Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling

Paper • 2502.06703 • Published 13 days ago • 134

liked a model 13 days ago

agentica-org/DeepScaleR-1.5B-Preview

Text Generation • Updated about 16 hours ago • 22.5k • • 470

upvoted an article 18 days ago

Article

SmolVLM Grows Smaller – Introducing the 250M & 500M Models!

Jan 23

• 142

upvoted a paper 4 months ago

Animate-X: Universal Character Image Animation with Enhanced Motion Representation

Paper • 2410.10306 • Published Oct 14, 2024 • 55

upvoted a collection 5 months ago

Llama 3.2

Meta's new Llama 3.2 vision and text models including 1B, 3B, 11B and 90B. Includes GGUF, 4-bit bnb and original versions. • 27 items • Updated 18 days ago • 54

upvoted a paper 5 months ago

MIMO: Controllable Character Video Synthesis with Spatial Decomposed Modeling

Paper • 2409.16160 • Published Sep 24, 2024 • 33

upvoted an article 5 months ago

Article

Fine-tuning LLMs to 1.58bit: extreme quantization made easy

Sep 18, 2024

• 223

upvoted 5 papers 7 months ago

EfficientQAT: Efficient Quantization-Aware Training for Large Language Models

Paper • 2407.11062 • Published Jul 10, 2024 • 8

Spectra: A Comprehensive Study of Ternary, Quantized, and FP16 Language Models

Paper • 2407.12327 • Published Jul 17, 2024 • 78

Compact Language Models via Pruning and Knowledge Distillation

Paper • 2407.14679 • Published Jul 19, 2024 • 39

SlowFast-LLaVA: A Strong Training-Free Baseline for Video Large Language Models

Paper • 2407.15841 • Published Jul 22, 2024 • 40

DDK: Distilling Domain Knowledge for Efficient Large Language Models

Paper • 2407.16154 • Published Jul 23, 2024 • 22

upvoted a paper 9 months ago

LLaMA-NAS: Efficient Neural Architecture Search for Large Language Models

Paper • 2405.18377 • Published May 28, 2024 • 18

liked a model over 1 year ago

Qwen/Qwen-7B-Chat-Int4

Text Generation • Updated Jan 4, 2024 • 1.55k • 68