pikoGPT-51M (base)

The second model in the "piko" family, which is my take on training smaller GPT-2 like models. Not tuned, just a base model.

Training

Trained on a single 3090 for ~30k steps with Karpathy's train_gpt2.py script from the llm.c repo. Dataset used is edu_fineweb10B from the aforementioned repo. The model achieved the val loss of ~3.57.

Optimizations

Compared to the pathfinder 16M model variant this one is:

  • trained in bfloat16 (compared to float32 for 16M model)
  • has vocabulary size bumped to 50304 (from 50257 in GPT-2 and pikoGPT-16M) following Karpathy's rule of "nice" numbers

Model file

This repo contains the .pt file which has the following structure

{
    'step': step,
    'config': asdict(model.config),
    'model_state_dict': model.state_dict(),
},

To load the model you can use the following piece of code (not very pretty, I know)

checkpoint = torch.load(path, weights_only=True)

config = GPTConfig(**checkpoint['config'])
model = GPT(config)

any_key = next(iter(checkpoint['model_state_dict'].keys()))
if any_key.startswith("_orig_mod."):
    # strip "_orig_mod." if the model was compiled
    model_state_dict = {k[10:]: v for k, v in checkpoint['model_state_dict'].items()}
else:
    model_state_dict = checkpoint['model_state_dict']

model.load_state_dict(model_state_dict)
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.

Dataset used to train pagarsky/pikoGPT-51M