ChatMachine_v1: GPT-2 Fine-tuned on SQuAD

This model is a GPT-2 variant fine-tuned on the Stanford Question Answering Dataset (SQuAD) for question-answering tasks. It has been trained to understand context and generate relevant answers to questions based on provided information.

Model Description

  • Base Model: GPT-2 (124M parameters)
  • Training Data: Stanford Question Answering Dataset (SQuAD)
  • Task: Question Answering
  • Framework: PyTorch with Hugging Face Transformers

Training Details

The model was fine-tuned using:

  • Mixed precision training (bfloat16)
  • Learning rate: 2e-5
  • Batch size: 16
  • Gradient accumulation steps: 8
  • Warmup steps: 1000
  • Weight decay: 0.1

Usage

from transformers import GPT2LMHeadModel, GPT2Tokenizer

# Load model and tokenizer
model = GPT2LMHeadModel.from_pretrained("houcine-bdk/chatMachine_v1")
tokenizer = GPT2Tokenizer.from_pretrained("gpt2")
tokenizer.pad_token = tokenizer.eos_token

# Format your input
context = "Paris is the capital and largest city of France."
question = "What is the capital of France?"
input_text = f"Context: {context} Question: {question} Answer:"

# Generate answer
inputs = tokenizer(input_text, return_tensors="pt", padding=True)
outputs = model.generate(
    **inputs,
    max_new_tokens=50,
    temperature=0.3,
    do_sample=True,
    top_p=0.9,
    num_beams=4,
    early_stopping=True,
    pad_token_id=tokenizer.pad_token_id,
    eos_token_id=tokenizer.eos_token_id,
)

# Extract answer
answer = tokenizer.decode(outputs[0], skip_special_tokens=True).split("Answer:")[-1].strip()
print(f"Answer: {answer}")

Performance and Limitations

The model performs best with:

  • Simple, focused questions
  • Clear, concise context
  • Factual questions (who, what, when, where)

Limitations:

  • May struggle with complex, multi-part questions
  • Performance depends on the clarity and relevance of the provided context
  • Best suited for short, focused answers rather than lengthy explanations

Example Questions

test_cases = [
    {
        "context": "George Washington was the first president of the United States, serving from 1789 to 1797.",
        "question": "Who was the first president of the United States?"
    },
    {
        "context": "The brain uses approximately 20 percent of the body's total energy consumption.",
        "question": "How much of the body's energy does the brain use?"
    }
]

Expected outputs:

  • "George Washington"
  • "20 percent"

Training Infrastructure

The model was trained on an RTX 4090 GPU using:

  • PyTorch with CUDA optimizations
  • Mixed precision training (bfloat16)
  • Gradient accumulation for effective batch size scaling

Citation

If you use this model, please cite:

@misc{chatmachine_v1,
  author = {Houcine BDK},
  title = {ChatMachine_v1: GPT-2 Fine-tuned on SQuAD},
  year = {2024},
  publisher = {Hugging Face},
  journal = {Hugging Face Model Hub},
  howpublished = {\url{https://huggingface.co/houcine-bdk/chatMachine_v1}}
}

License

This model is released under the MIT License.

Downloads last month
19
Safetensors
Model size
124M params
Tensor type
BF16
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.