# Distilled GPT-2 Story Generation Model (June 2025) This is a distilled version of GPT-2, fine-tuned using knowledge distillation from a teacher model (Qwen3-1.7B) on the ROCStories dataset. The model is designed for story generation with constraint words. ## Model Details - **Base Model**: GPT-2 - **Teacher Model**: Qwen3-1.7B - **Dataset**: ROCStories (Ximing/ROCStories) - **Training Objective**: Knowledge distillation to match teacher outputs. - **Training Date**: June 12, 2025 - **Evaluation**: Constraint word inclusion success rate. ## Usage ```python from transformers import AutoModelForCausalLM, AutoTokenizer model_name = "here4code/distilled-gpt2-story-generation-Qwen3-1.7B" tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForCausalLM.from_pretrained(model_name) prompt = "Once upon a time, there was a happy dog" inputs = tokenizer(prompt, return_tensors="pt") outputs = model.generate(**inputs, max_new_tokens=100) print(tokenizer.decode(outputs[0], skip_special_tokens=True)) ``` ## Training Details - **Epochs**: 2 - **Batch Size**: 128 - **Training Loss per Epoch**: [np.float64(4.636714812309023), np.float64(2.7680806659516835)] ## Evaluation Results - **Teacher Constraint Word Inclusion Success Rate**: 0.8500 - **Student Constraint Word Inclusion Success Rate**: 0.9000 ## License This model is released under the MIT License.