kyrylokumar
/

gpt2-quantzed-gguf

Model card Files Files and versions Community

kyrylokumar commited on Nov 16, 2024

Commit

c4f0001

·

verified ·

1 Parent(s): 8398dfc

Update README.md

Files changed (1) hide show

README.md +9 -1

README.md CHANGED Viewed

@@ -1,3 +1,11 @@
 # Quantizing gpt2: Analysis of Time and Memory Predictions
 This document outlines various quantization techniques applied to the gpt2 model and analyzes their impact on memory usage, loss, and execution time, focusing on explaining the observed trends in time and memory usage.
@@ -99,4 +107,4 @@ By understanding these factors, one can choose the appropriate quantization stra
 ## Part 3 - Quantization using llama.cpp
 * The PyTorch model (`pytorch_model.bin`) is converted to a quantized gguf file (`gpt2.ggml`) using llama.cpp.
-* The quantized model is uploaded to Hugging Face: [gpt2-quantized-gguf](https://huggingface.co/kyrylokumar/gpt2-quantzed-gguf)

+---
+datasets:
+- Salesforce/wikitext
+metrics:
+- perplexity
+base_model:
+- openai-community/gpt2
+---
 # Quantizing gpt2: Analysis of Time and Memory Predictions
 This document outlines various quantization techniques applied to the gpt2 model and analyzes their impact on memory usage, loss, and execution time, focusing on explaining the observed trends in time and memory usage.
 ## Part 3 - Quantization using llama.cpp
 * The PyTorch model (`pytorch_model.bin`) is converted to a quantized gguf file (`gpt2.ggml`) using llama.cpp.
+* The quantized model is uploaded to Hugging Face: [gpt2-quantized-gguf](https://huggingface.co/kyrylokumar/gpt2-quantzed-gguf)