pikoGPT-51M (base)
The second model in the "piko" family, which is my take on training smaller GPT-2 like models. Not tuned, just a base model.
Training
Trained on a single 3090 for ~30k steps with Karpathy's train_gpt2.py script from the llm.c repo. Dataset used is edu_fineweb10B from the aforementioned repo. The model achieved the val loss of ~3.57.
Optimizations
Compared to the pathfinder 16M model variant this one is:
- trained in bfloat16 (compared to float32 for 16M model)
- has vocabulary size bumped to 50304 (from 50257 in GPT-2 and pikoGPT-16M) following Karpathy's rule of "nice" numbers
Model file
This repo contains the .pt file which has the following structure
{
'step': step,
'config': asdict(model.config),
'model_state_dict': model.state_dict(),
},
To load the model you can use the following piece of code (not very pretty, I know)
checkpoint = torch.load(path, weights_only=True)
config = GPTConfig(**checkpoint['config'])
model = GPT(config)
any_key = next(iter(checkpoint['model_state_dict'].keys()))
if any_key.startswith("_orig_mod."):
# strip "_orig_mod." if the model was compiled
model_state_dict = {k[10:]: v for k, v in checkpoint['model_state_dict'].items()}
else:
model_state_dict = checkpoint['model_state_dict']
model.load_state_dict(model_state_dict)
Inference Providers
NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API:
The model has no library tag.