Pythia Deduped Series GGML
This repository contains quantized conversions of EleutherAI's Pythia Deduped checkpoints.
For use with frontends that support GGML quantized GPT-NeoX models, such as KoboldCpp and Oobabooga (with the CTransformers loader).
Last updated on 2023-05-25.
For other versions of the models, see here:
- GGMLv1 q4_3 (70M to 12B)
- GGMLv1 q5_0 / q5_1 / q8_0 (70M to 2.8B)
- GGMLv1 q4_0 / q4_2 (70M to 2.8B)
- GGMLv2 q4_0 / q5_1 (70M to 2.8B)
- GGMLv3 q4_0 / q5_1 (70M to 2.8B)
Description:
- The motivation behind these quantizations was that the LLaMA series lacks sizes below 7B, whereas it was the norm for older models to be available in as little as ~125M parameters. This makes it uncomfortable to run on hardware with less than 4GB of RAM, even with 2-bit quantization.
RAM USAGE
Model | RAM usage |
---|---|
Unloaded | 41.3 MiB |
ggmlv3-pythia-70m-deduped-q4_0.bin | 95.5 MiB |
ggmlv3-pythia-160m-deduped-q4_0.bin | 201.1 MiB |
ggmlv3-pythia-410m-deduped-q4_0.bin | 415.1 MiB |
ggmlv3-pythia-1b-deduped-q4_0.bin | 762.2 MiB |
ggmlv3-pythia-1.4b-deduped-q4_0.bin | 1.0 GiB |
ggmlv3-pythia-2.8b-deduped-q4_0.bin | 1.9 GiB |
ggmlv3-pythia-70m-deduped-q5_1.bin | 108.7 MiB |
ggmlv3-pythia-160m-deduped-q5_1.bin | 226.9 MiB |
ggmlv3-pythia-410m-deduped-q5_1.bin | 494.0 MiB |
ggmlv3-pythia-1b-deduped-q5_1.bin | 943.9 MiB |
ggmlv3-pythia-1.4b-deduped-q5_1.bin | 1.3 GiB |
ggmlv3-pythia-2.8b-deduped-q5_1.bin | 2.3 GiB |
Tested on KoboldCpp with OpenBLAS enabled.
Inference Providers
NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API:
The model has no library tag.