Nitral-Archive
/

Pinecone-Rune-12b-Token-Surgery-Chatml-v0.1a

Model card Files Files and versions

Original base model Entropicengine/Pinecone-Rune-12b

Modified base model used for this train: Nitral-AI/Pinecone-Rune-12b-chatmlified

Only around 750 entries in rank/alpha 32 4bit-qlora at 3e-6 for 2 epochs. bs 4 grad accum 4, for ebs 16 with cosine.

Dataset here: https://huggingface.co/datasets/Nitral-AI/antirep_sharegpt

Example Notebook using l4/t4: https://huggingface.co/Nitral-AI/Pinecone-Rune-12b-Token-Surgery-Chatml/tree/main/TokenSurgeon-Example

Boring Training graph.

Starting loss: 1.74 Final loss 0.95

Downloads last month: 12

Safetensors

Model size

12.2B params

Tensor type

BF16

·

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Nitral-Archive/Pinecone-Rune-12b-Token-Surgery-Chatml-v0.1a

Base model

Entropicengine/Pinecone-Rune-12b

Finetuned

(2)

this model

Quantizations