Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2311.03285

S^{3}: Increasing GPU Utilization during Generative Inference for Higher Throughput

Paper • 2306.06000 • Published Jun 9, 2023 • 1
Fast Distributed Inference Serving for Large Language Models

Paper • 2305.05920 • Published May 10, 2023 • 1
Response Length Perception and Sequence Scheduling: An LLM-Empowered LLM Inference Pipeline

Paper • 2305.13144 • Published May 22, 2023 • 1
Towards MoE Deployment: Mitigating Inefficiencies in Mixture-of-Expert (MoE) Inference

Paper • 2303.06182 • Published Mar 10, 2023 • 1

LoftQ: LoRA-Fine-Tuning-Aware Quantization for Large Language Models

Paper • 2310.08659 • Published Oct 12, 2023 • 27
QA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language Models

Paper • 2309.14717 • Published Sep 26, 2023 • 44
ModuLoRA: Finetuning 3-Bit LLMs on Consumer GPUs by Integrating with Modular Quantizers

Paper • 2309.16119 • Published Sep 28, 2023 • 1
LoRA ensembles for large language model fine-tuning

Paper • 2310.00035 • Published Sep 29, 2023 • 2

VeRA: Vector-based Random Matrix Adaptation

Paper • 2310.11454 • Published Oct 17, 2023 • 30
S-LoRA: Serving Thousands of Concurrent LoRA Adapters

Paper • 2311.03285 • Published Nov 6, 2023 • 31
SiRA: Sparse Mixture of Low Rank Adaptation

Paper • 2311.09179 • Published Nov 15, 2023 • 9

Chain-of-Verification Reduces Hallucination in Large Language Models

Paper • 2309.11495 • Published Sep 20, 2023 • 38
EIPE-text: Evaluation-Guided Iterative Plan Extraction for Long-Form Narrative Text Generation

Paper • 2310.08185 • Published Oct 12, 2023 • 8
The Consensus Game: Language Model Generation via Equilibrium Search

Paper • 2310.09139 • Published Oct 13, 2023 • 14
In-Context Pretraining: Language Modeling Beyond Document Boundaries

Paper • 2310.10638 • Published Oct 16, 2023 • 30

From Sparse to Soft Mixtures of Experts

Paper • 2308.00951 • Published Aug 2, 2023 • 20
LoraHub: Efficient Cross-Task Generalization via Dynamic LoRA Composition

Paper • 2307.13269 • Published Jul 25, 2023 • 32
S-LoRA: Serving Thousands of Concurrent LoRA Adapters

Paper • 2311.03285 • Published Nov 6, 2023 • 31

Previous
1
2
3
Next

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs