gradientai
/

Llama-3-8B-Instruct-Gradient-1048k

Text Generation

text-generation-inference

Model card Files Files and versions

tpeng726 commited on Apr 29, 2024

Commit

c64956e

·

verified ·

1 Parent(s): d336d0c

Update %

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -14,7 +14,7 @@ Gradient incorporates your data to deploy autonomous assistants that power criti
 For more info see our [End-to-end development service for custom LLMs and AI systems](https://gradient.ai/development-lab)
-This model extends LLama-3 8B's context length from 8k to > 1040K, developed by Gradient, sponsored by compute from [Crusoe Energy](https://huggingface.co/crusoeai). It demonstrates that SOTA LLMs can learn to operate on long context with minimal training by appropriately adjusting RoPE theta. We trained on 830M tokens for this stage, which is < 0.004% of Llama-3's original pre-training data, and 1.4B tokens total for all stages.
 ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6585dc9be92bc5f258156bd6/6MKLoX2ruLIaREiyb6coO.png)

 For more info see our [End-to-end development service for custom LLMs and AI systems](https://gradient.ai/development-lab)
+This model extends LLama-3 8B's context length from 8k to > 1040K, developed by Gradient, sponsored by compute from [Crusoe Energy](https://huggingface.co/crusoeai). It demonstrates that SOTA LLMs can learn to operate on long context with minimal training by appropriately adjusting RoPE theta. We trained on 830M tokens for this stage, which is < 0.006% of Llama-3's original pre-training data, and 1.4B tokens total for all stages.
 ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6585dc9be92bc5f258156bd6/6MKLoX2ruLIaREiyb6coO.png)