Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,11 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
# Quantizing gpt2: Analysis of Time and Memory Predictions
|
2 |
|
3 |
This document outlines various quantization techniques applied to the gpt2 model and analyzes their impact on memory usage, loss, and execution time, focusing on explaining the observed trends in time and memory usage.
|
@@ -99,4 +107,4 @@ By understanding these factors, one can choose the appropriate quantization stra
|
|
99 |
## Part 3 - Quantization using llama.cpp
|
100 |
|
101 |
* The PyTorch model (`pytorch_model.bin`) is converted to a quantized gguf file (`gpt2.ggml`) using llama.cpp.
|
102 |
-
* The quantized model is uploaded to Hugging Face: [gpt2-quantized-gguf](https://huggingface.co/kyrylokumar/gpt2-quantzed-gguf)
|
|
|
1 |
+
---
|
2 |
+
datasets:
|
3 |
+
- Salesforce/wikitext
|
4 |
+
metrics:
|
5 |
+
- perplexity
|
6 |
+
base_model:
|
7 |
+
- openai-community/gpt2
|
8 |
+
---
|
9 |
# Quantizing gpt2: Analysis of Time and Memory Predictions
|
10 |
|
11 |
This document outlines various quantization techniques applied to the gpt2 model and analyzes their impact on memory usage, loss, and execution time, focusing on explaining the observed trends in time and memory usage.
|
|
|
107 |
## Part 3 - Quantization using llama.cpp
|
108 |
|
109 |
* The PyTorch model (`pytorch_model.bin`) is converted to a quantized gguf file (`gpt2.ggml`) using llama.cpp.
|
110 |
+
* The quantized model is uploaded to Hugging Face: [gpt2-quantized-gguf](https://huggingface.co/kyrylokumar/gpt2-quantzed-gguf)
|