Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2501.08313

Selective Attention Improves Transformer

Paper • 2410.02703 • Published Oct 3, 2024 • 24
Differential Transformer

Paper • 2410.05258 • Published Oct 7, 2024 • 171
TidalDecode: Fast and Accurate LLM Decoding with Position Persistent Sparse Attention

Paper • 2410.05076 • Published Oct 7, 2024 • 8
SeerAttention: Learning Intrinsic Sparse Attention in Your LLMs

Paper • 2410.13276 • Published Oct 17, 2024 • 27

Runtime error

80

80

Dailypapershackernews

📈
Prithvi WxC: Foundation Model for Weather and Climate

Paper • 2409.13598 • Published Sep 20, 2024 • 42
TurtleBench: Evaluating Top Language Models via Real-World Yes/No Puzzles

Paper • 2410.05262 • Published Oct 7, 2024 • 9
Ichigo: Mixed-Modal Early-Fusion Realtime Voice Assistant

Paper • 2410.15316 • Published Oct 20, 2024 • 10

LinFusion: 1 GPU, 1 Minute, 16K Image

Paper • 2409.02097 • Published Sep 3, 2024 • 33
Phidias: A Generative Model for Creating 3D Content from Text, Image, and 3D Conditions with Reference-Augmented Diffusion

Paper • 2409.11406 • Published Sep 17, 2024 • 26
Diffusion Models Are Real-Time Game Engines

Paper • 2408.14837 • Published Aug 27, 2024 • 123
Segment Anything with Multiple Modalities

Paper • 2408.09085 • Published Aug 17, 2024 • 22

LongVILA: Scaling Long-Context Visual Language Models for Long Videos

Paper • 2408.10188 • Published Aug 19, 2024 • 51
xGen-MM (BLIP-3): A Family of Open Large Multimodal Models

Paper • 2408.08872 • Published Aug 16, 2024 • 98
Building and better understanding vision-language models: insights and future directions

Paper • 2408.12637 • Published Aug 22, 2024 • 125
Show-o: One Single Transformer to Unify Multimodal Understanding and Generation

Paper • 2408.12528 • Published Aug 22, 2024 • 51

MambaVision: A Hybrid Mamba-Transformer Vision Backbone

Paper • 2407.08083 • Published Jul 10, 2024 • 29
Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model

Paper • 2408.11039 • Published Aug 20, 2024 • 59
The Mamba in the Llama: Distilling and Accelerating Hybrid Models

Paper • 2408.15237 • Published Aug 27, 2024 • 39
Fine-Tuning Image-Conditional Diffusion Models is Easier than You Think

Paper • 2409.11355 • Published Sep 17, 2024 • 29

Running

182

182

BigCodeBench Leaderboard

🥇

Explore and analyze code evaluation data
Running

678

678

UGI Leaderboard

📢

Display a customizable leaderboard with model data
Running

4.07k

4.07k

Chatbot Arena Leaderboard

🏆

Display chatbot leaderboard statistics
Running on CPU Upgrade

4.87k

4.87k

MTEB Leaderboard

🥇

Select benchmarks and languages for text embeddings evaluation

Specific Models

Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone

Paper • 2404.14219 • Published Apr 22, 2024 • 256
How Good Are Low-bit Quantized LLaMA3 Models? An Empirical Study

Paper • 2404.14047 • Published Apr 22, 2024 • 45
Octopus v4: Graph of language models

Paper • 2404.19296 • Published Apr 30, 2024 • 117
DeepSeek-V3 Technical Report

Paper • 2412.19437 • Published Dec 27, 2024 • 52

LM Architectures

Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length

Paper • 2404.08801 • Published Apr 12, 2024 • 66
RecurrentGemma: Moving Past Transformers for Efficient Open Language Models

Paper • 2404.07839 • Published Apr 11, 2024 • 45
Eagle and Finch: RWKV with Matrix-Valued States and Dynamic Recurrence

Paper • 2404.05892 • Published Apr 8, 2024 • 33
Mamba: Linear-Time Sequence Modeling with Selective State Spaces

Paper • 2312.00752 • Published Dec 1, 2023 • 140

Foundation Models

OLMo: Accelerating the Science of Language Models

Paper • 2402.00838 • Published Feb 1, 2024 • 83
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

Paper • 2403.05530 • Published Mar 8, 2024 • 63
StarCoder: may the source be with you!

Paper • 2305.06161 • Published May 9, 2023 • 31
SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective Depth Up-Scaling

Paper • 2312.15166 • Published Dec 23, 2023 • 57

Natural Language (LLM, NLP etc)

Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing

Paper • 2404.12253 • Published Apr 18, 2024 • 55
FlowMind: Automatic Workflow Generation with LLMs

Paper • 2404.13050 • Published Mar 17, 2024 • 34
How Far Can We Go with Practical Function-Level Program Repair?

Paper • 2404.12833 • Published Apr 19, 2024 • 7
Replacing Judges with Juries: Evaluating LLM Generations with a Panel of Diverse Models

Paper • 2404.18796 • Published Apr 29, 2024 • 69

Previous
1
2
3
4
5
Next

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs