testingmodel / README.md
shivash's picture
Upload Enhanced Hybrid Transformer 416M weights 🚀
701cfd9 verified

Enhanced Hybrid Transformer 416M

🚀 416,417,792 parameter transformer with modern optimizations.

Features

  • 24 layers × 16 heads
  • GQA-4 (Grouped Query Attention)
  • SwiGLU activation
  • RMSNorm normalization
  • RoPE positional embeddings

Contents

  • pytorch_model.bin - Model weights
  • config.json - Model configuration
  • tokenizer.json - Tokenizer files
  • README.md - This file

Usage

Load with the original repository code for full functionality.


🚀 Generated with Claude Code