nm-testing
/

TinyLlama-1.1B-Chat-v1.0-open_platypus-pruned50-quant-ds

Model card Files Files and versions Community

mwitiderrick commited on Jan 23, 2024

Commit

43acc24

·

verified ·

1 Parent(s): 5d04de5

Update README.md

Files changed (1) hide show

README.md +6 -7

README.md CHANGED Viewed

@@ -31,13 +31,7 @@ model = TextGeneration(model_path="hf:nm-testing/TinyLlama-1.1B-Chat-v1.0-pruned
 print(model(formatted_prompt, max_new_tokens=200).generations[0].text)
 """
-1. Preheat the oven to 375°F (178°C).
-2. In a mixing bowl, add 1 cup of all-purpose flour, 1 cup of melted coconut oil, 1/2 cup of sugar, 1/2 cup of banana, 1/2 cup of melted coconut oil, 1/2 cup of salt, 1/2 cup of vanilla extract, and 1/2 cup of baking powder.
-3. Mix the ingredients together until they are well combined.
-4. Add 1/2 cup of melted coconut oil to the mixture.
-5. Add 1/2 cup of melted coconut oil to the mixture.
-6. Mix the ingredients together until they are well combined.
-7. Add 1/2 cup of melted
 """
 ```
@@ -88,6 +82,11 @@ run_train(
     splits = splits
 )
 ```
 Follow the instructions on our [One Shot With SparseML](https://github.com/neuralmagic/sparseml/tree/main/src/sparseml/transformers/sparsification/obcq) page for a step-by-step guide for performing one-shot quantization of large language models.
 ## Slack

 print(model(formatted_prompt, max_new_tokens=200).generations[0].text)
 """
 """
 ```
     splits = splits
 )
 ```
+## Export Model
+Export the model while injecting the KV Cache
+```bash
+sparseml.export --task text-generation output_finetune/
+```
 Follow the instructions on our [One Shot With SparseML](https://github.com/neuralmagic/sparseml/tree/main/src/sparseml/transformers/sparsification/obcq) page for a step-by-step guide for performing one-shot quantization of large language models.
 ## Slack