Update %
Browse files
README.md
CHANGED
|
@@ -14,7 +14,7 @@ Gradient incorporates your data to deploy autonomous assistants that power criti
|
|
| 14 |
|
| 15 |
For more info see our [End-to-end development service for custom LLMs and AI systems](https://gradient.ai/development-lab)
|
| 16 |
|
| 17 |
-
This model extends LLama-3 8B's context length from 8k to > 1040K, developed by Gradient, sponsored by compute from [Crusoe Energy](https://huggingface.co/crusoeai). It demonstrates that SOTA LLMs can learn to operate on long context with minimal training by appropriately adjusting RoPE theta. We trained on 830M tokens for this stage, which is < 0.
|
| 18 |
|
| 19 |

|
| 20 |
|
|
|
|
| 14 |
|
| 15 |
For more info see our [End-to-end development service for custom LLMs and AI systems](https://gradient.ai/development-lab)
|
| 16 |
|
| 17 |
+
This model extends LLama-3 8B's context length from 8k to > 1040K, developed by Gradient, sponsored by compute from [Crusoe Energy](https://huggingface.co/crusoeai). It demonstrates that SOTA LLMs can learn to operate on long context with minimal training by appropriately adjusting RoPE theta. We trained on 830M tokens for this stage, which is < 0.006% of Llama-3's original pre-training data, and 1.4B tokens total for all stages.
|
| 18 |
|
| 19 |

|
| 20 |
|