neurochainai/worker-model-llama-3.1-8b-instruct-gguf

Llamacpp imatrix Quantizations of Meta-Llama-3.1-8B-Instruct (Fork by NeurochainAI)

This repository is a fork of the original Meta-Llama-3.1-8B-Instruct GGUF quantizations, tailored for NeurochainAI's inference network. The models provided here are part of the foundation for NeurochainAI's state-of-the-art AI inference solutions.

NeurochainAI uses this model for optimizing and running inference on distributed networks, allowing for efficient and robust processing of language models across various platforms and devices.

While many technical aspects of the original repository are preserved, only three models from the original commit have been integrated into this fork, as they best fit the specific performance and inference requirements of our network:

Meta-Llama-3.1-8B-Instruct-Q6_K.gguf
Size: 6.60GB
Description: Very high-quality quantization. Near-perfect accuracy for high-performance inference.
Meta-Llama-3.1-8B-Instruct-Q6_K_L.gguf
Size: 6.85GB
Description: Uses Q8_0 for embedding and output weights, providing near-perfect inference quality. Highly recommended for demanding applications.
Meta-Llama-3.1-8B-Instruct-Q8_0.gguf
Size: 8.54GB
Description: The highest-quality quantization available. Typically not needed but essential for maximizing inference accuracy in specific cases.

The quantization process was conducted using the llama.cpp release b3472, leveraging the imatrix option to optimize performance for our inference pipeline. The original quantization dataset was sourced from bartowski's dataset.

License

The models and content here are licensed under the Llama 3.1 Community License, as provided by Meta. Please make sure to comply with the terms outlined in the Llama 3.1 license agreement.