LLM Architecture - a JM-Brun Collection

JM-Brun 's Collections

Prompt Optimization

Tabular

Agents

SLMs

LLM-KG

LLM Architecture

Interpretability XAI

LLM Architecture

updated Jun 25

MiniMax-01: Scaling Foundation Models with Lightning Attention

Paper • 2501.08313 • Published Jan 14 • 298
Scalable-Softmax Is Superior for Attention

Paper • 2501.19399 • Published Jan 31 • 22
FastKV: KV Cache Compression for Fast Long-Context Processing with Token-Selective Propagation

Paper • 2502.01068 • Published Feb 3 • 18
Scaling Embedding Layers in Language Models

Paper • 2502.01637 • Published Feb 3 • 24
Taming the Titans: A Survey of Efficient LLM Inference Serving

Paper • 2504.19720 • Published Apr 28 • 12