SQuARE: Sequential Question Answering Reasoning Engine for Enhanced Chain-of-Thought in Large Language Models Paper • 2502.09390 • Published 10 days ago • 16
view article Article Universal Assisted Generation: Faster Decoding with Any Assistant Model Oct 29, 2024 • 52
view article Article Assisted Generation: a new direction toward low-latency text generation May 11, 2023 • 44
view article Article Blazing Fast SetFit Inference with 🤗 Optimum Intel on Xeon Apr 3, 2024 • 11
RAG Foundry: A Framework for Enhancing LLMs for Retrieval Augmented Generation Paper • 2408.02545 • Published Aug 5, 2024 • 36
view article Article Training and Finetuning Embedding Models with Sentence Transformers v3 May 28, 2024 • 187
Accelerating Speculative Decoding using Dynamic Speculation Length Paper • 2405.04304 • Published May 7, 2024 • 2
Distributed Speculative Inference of Large Language Models Paper • 2405.14105 • Published May 23, 2024 • 17