megalodon-200m: minipile

Small pretraining experiment:

Model Configuration

  • Number of Layers: 12
  • Model Dimension: 1024
  • Z Dimension: 256
  • Value Dimension: 2048
  • Number of Heads: 1
  • FFN Hidden Dimension: 2560
  • CEMA NDIM: 16
  • Chunk Size: 2048
  • Efficient Attention: None
  • Initialization Mode: He
  • Vocabulary Size: 20480
  • Output Size: 20480
  • Normalization Groups: 32
  • Normalization Affine: True
  • Normalization Epsilon: 1e-05
  • ROPE Base: None
  • Dropout: 0.0
  • Hidden Dropout: 0.0
  • Attention Dropout: 0.0
  • SWIGLU: False
  • Rescale NFFN: False
  • Scale Embedding: False
  • Share Embedding: False
  • Layerwise Checkpointing: False
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.

Dataset used to train pszemraj/megalodon-200m-minipile