llama-161M

Trained on 100B tokens.

  • 1e-3 LR
  • 0.1 wd
  • WSD scheduler with 10% decay
  • 80% code, 10% NL, 10% instruction data
  • Dataset decontaminated against popular benchmarks following bigcode
  • 8x3090s 110~ hours

This is a base pretrained model and requires further fine tuning to be useful.

Model Details

openai/openai_humaneval (greedy) mbpp (greedy)
9.2% 9.8%
Downloads last month
182
Safetensors
Model size
162M params
Tensor type
BF16
Β·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.

Model tree for abacaj/llama-161M-100B

Finetunes
6 models
Quantizations
4 models

Spaces using abacaj/llama-161M-100B 6