Original base model Entropicengine/Pinecone-Rune-12b
Modified base model used for this train: Nitral-AI/Pinecone-Rune-12b-chatmlified
Only around 750 entries in rank/alpha 32 4bit-qlora at 3e-6 for 2 epochs. bs 4 grad accum 4, for ebs 16 with cosine.
Dataset here: https://huggingface.co/datasets/Nitral-AI/antirep_sharegpt
Example Notebook using l4/t4: https://huggingface.co/Nitral-AI/Pinecone-Rune-12b-Token-Surgery-Chatml/tree/main/TokenSurgeon-Example
Boring Training graph.
Starting loss: 1.74 Final loss 0.95
- Downloads last month
- 12
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support