Transformers
English
Inference Endpoints

30,142,848 trainable parameters.

Embedding parameters: 19,298,688

Non-embedding parameters: 10,844,160

Tokenizer: GPT-2

Vocabulary size: 50,257

Compute: single T4 GPU

Total train time: 2 hours and 40 minutes

Total train tokens: 136,000,000

Epochs: 2

Final train Loss: 2.9811

Final test Loss: 2.7963


try the following script for inference:

!pip install huggingface_hub !pip install transformers !pip install torch from transformers import GPT2Tokenizer, GPT2Config, GPT2LMHeadModel from huggingface_hub import hf_hub_download import torch

Name

model_name = 'Mizule/Dense-30M'

Authenticate

token = input("Enter your Hugging Face token: ")

Download

model_file = hf_hub_download(repo_id=f"{model_name}", filename="Dense-30M.pth", use_auth_token=token)

Custom config

tokenizer = GPT2Tokenizer.from_pretrained("gpt2")

config = GPT2Config( vocab_size=tokenizer.vocab_size, n_positions=512, n_ctx=512, n_embd=384, n_layer=6, n_head=8 )

Load model

model = GPT2LMHeadModel(config) model.load_state_dict(torch.load(model_file, map_location=torch.device('cpu'))) device = "cuda" if torch.cuda.is_available() else "cpu" model.to(device) model.eval()

Inference settings

def generate_text(prompt, max_length=128, temperature=0.2, top_k=50, top_p=0.9): inputs = tokenizer(prompt, return_tensors="pt") inputs = {key: value.to("cuda" if torch.cuda.is_available() else "cpu") for key, value in inputs.items()} outputs = model.generate( **inputs, max_length=max_length, temperature=temperature, top_k=top_k, top_p=top_p, num_return_sequences=1, no_repeat_ngram_size=2, do_sample=True, early_stopping=True ) return tokenizer.decode(outputs[0], skip_special_tokens=True)

Interactive loop (it's an undertrained base model, don't expect it to chat)

while True: prompt = input("Prompt: ") if prompt.lower() == 'exit': break output = generate_text(prompt) print(f"Generated text: {output}")

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no pipeline_tag.

Dataset used to train Mizule/Dense-30M