Upload folder using huggingface_hub

Files changed (3) hide show

README.md CHANGED Viewed

@@ -1,21 +1,21 @@
     # Custom GPT Model
-    This is a custom GPT model with the following modifications from standard GPT-2:
-    - RMS normalization instead of LayerNorm
     - Rotary positional embeddings (RoPE)
     - Separate Q,K,V projections
     - Squared ReLU activation in MLP
     - QK normalization in attention
     - Zero initialization for projection layers
-    ## Model Architecture
     - Vocabulary Size: 50304
     - Context Length: 1024
     - Number of Layers: 12
     - Number of Heads: 6
     - Embedding Dimension: 768
     ## Usage
     ```python
     from transformers import AutoModel

     # Custom GPT Model
+    This is a custom GPT model with:
+    - RMS normalization
     - Rotary positional embeddings (RoPE)
     - Separate Q,K,V projections
     - Squared ReLU activation in MLP
     - QK normalization in attention
     - Zero initialization for projection layers
+    ## Architecture
     - Vocabulary Size: 50304
     - Context Length: 1024
     - Number of Layers: 12
     - Number of Heads: 6
     - Embedding Dimension: 768
     ## Usage
     ```python
     from transformers import AutoModel

config.json CHANGED Viewed

@@ -1,5 +1,4 @@
 {
-  "_attn_implementation_autoset": true,
   "architectures": [
     "CustomGPTPreTrainedModel"
   ],
@@ -9,6 +8,7 @@
   "n_head": 6,
   "n_layer": 12,
   "tokenizer_class": "GPT2Tokenizer",
   "transformers_version": "4.48.1",
   "vocab_size": 50304
 }

 {
   "architectures": [
     "CustomGPTPreTrainedModel"
   ],
   "n_head": 6,
   "n_layer": 12,
   "tokenizer_class": "GPT2Tokenizer",
+  "torch_dtype": "float32",
   "transformers_version": "4.48.1",
   "vocab_size": 50304
 }

generation_config.json ADDED Viewed

+{
+  "_from_model_config": true,
+  "transformers_version": "4.48.1"
+}