airoboros-2.2.1-limarpv3-y34b-exl2

Exllama v2 quant of Doctor-Shotgun/airoboros-2.2.1-limarpv3-y34b

Branches:

  • main: measurement.json calculated at 2048 token calibration rows on PIPPA
  • 4.65bpw-h6: 4.65 decoder bits per weight, 6 head bits
    • ideal for 24gb GPUs at 8k context (on my 24gb Windows setup with flash attention 2, peak VRAM usage during inference with exllamav2_hf was around 23.4gb with 0.9gb used at baseline)
  • 6.0bpw-h6: 6 decoder bits per weight, 6 head bits
    • ideal for large (>24gb) VRAM setups
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model authors have turned it off explicitly.

Datasets used to train Doctor-Shotgun/airoboros-2.2.1-limarpv3-y34b-exl2