Edit Models filters

Inference Providers

HF Inference API

Misc

compressed-tensors

Inference Endpoints

text-generation-inference

4-bit precision

8-bit precision

text-embeddings-inference

Carbon Emissions

Mixture of Experts

Models

2,434

Full-text search

Active filters: compressed-tensors

nm-testing/tinyllama-oneshot-w8a8-channel-dynamic-token-v2

Text Generation • 1B • Updated Oct 9, 2024 • 11.8k

nm-testing/tinyllama-oneshot-w8-channel-a8-tensor

Text Generation • 1B • Updated Oct 9, 2024 • 11.8k

nm-testing/llama-3-instruct-w8a8-dyn-per-token-test

Text Generation • 8B • Updated Oct 9, 2024 • 3

nm-testing/Meta-Llama-3-8B-Instruct-W8A8-Dyn-Per-Token

Text Generation • 8B • Updated Oct 9, 2024 • 3

nm-testing/Meta-Llama-3-8B-Instruct-W8A8-Dyn-Per-Token-2048-Samples

Text Generation • 8B • Updated Oct 9, 2024 • 5

nm-testing/tinyllama-oneshot-w8a16-per-channel

Text Generation • 0.4B • Updated Oct 9, 2024 • 11.6k

nm-testing/Meta-Llama-3-8B-Instruct-W8-Channel-A8-Dynamic-Per-Token-Test

Text Generation • 8B • Updated Oct 9, 2024 • 56

nm-testing/Meta-Llama-3-8B-Instruct-W4-Group128-A16-Test

Text Generation • 2B • Updated Oct 9, 2024 • 3

RedHatAI/Phi-3-mini-128k-instruct-FP8

Text Generation • 4B • Updated Oct 9, 2024 • 9

RedHatAI/Phi-3-medium-128k-instruct-FP8

Text Generation • 14B • Updated Oct 9, 2024 • 2.75k • 5

nm-testing/Meta-Llama-3-8B-FP8-compressed-tensors-test

Text Generation • 8B • Updated Oct 9, 2024 • 11.7k

nm-testing/Meta-Llama-3-8B-FP8-compressed-tensors-test-bos

Text Generation • 8B • Updated Oct 9, 2024 • 3

nm-testing/TinyLlama-1.1B-compressed-tensors-kv-cache-scheme

Text Generation • 0.4B • Updated Oct 9, 2024 • 122k

nm-testing/Meta-Llama-3-8B-Instruct-W4A16-compressed-tensors-test

Text Generation • 2B • Updated Oct 9, 2024 • 6

RedHatAI/Phi-3-mini-128k-instruct-quantized.w8a16

Text Generation • 1B • Updated Oct 9, 2024 • 11

RedHatAI/Phi-3-medium-128k-instruct-quantized.w8a16

Text Generation • 4B • Updated Oct 9, 2024 • 4 • 2

nm-testing/Qwen2-0.5B-Instruct

Text Generation • 0.6B • Updated Oct 9, 2024 • 3

RedHatAI/Llama-2-7b-chat-quantized.w8a8

Text Generation • 7B • Updated Oct 9, 2024 • 2.51k • 1

RedHatAI/Meta-Llama-3-8B-Instruct-quantized.w8a8

Text Generation • 8B • Updated Oct 9, 2024 • 4.25k • 2

RedHatAI/Phi-3-mini-128k-instruct-quantized.w8a8

Text Generation • 4B • Updated Oct 9, 2024 • 22

RedHatAI/Phi-3-medium-128k-instruct-quantized.w8a8

Text Generation • 14B • Updated Oct 9, 2024 • 5 • 2

RedHatAI/Qwen2-1.5B-Instruct-quantized.w8a8

Text Generation • 2B • Updated Oct 9, 2024 • 1.2k

nm-testing/Qwen2-1.5B-Instruct-W8A16-Channelwise

Text Generation • 0.8B • Updated Oct 9, 2024 • 3

RedHatAI/Phi-3-mini-128k-instruct-quantized.w4a16

Text Generation • 0.7B • Updated Oct 9, 2024 • 40 • 1

RedHatAI/Qwen2-0.5B-Instruct-quantized.w8a8

Text Generation • 0.6B • Updated Oct 9, 2024 • 430

RedHatAI/Phi-3-medium-128k-instruct-quantized.w4a16

Text Generation • 2B • Updated Oct 9, 2024 • 12.8k • 3

RedHatAI/Qwen2-7B-Instruct-quantized.w8a8

Text Generation • 8B • Updated Oct 9, 2024 • 20

nm-testing/DeepSeek-Coder-V2-Lite-Instruct-FP8

Text Generation • 16B • Updated Feb 13 • 2.09k

RedHatAI/Meta-Llama-3-70B-Instruct-quantized.w8a8

Text Generation • 71B • Updated Oct 9, 2024 • 7

RedHatAI/Qwen2-72B-Instruct-quantized.w8a8

Text Generation • 73B • Updated Oct 9, 2024 • 4 • 1