ChatMachine_v1: GPT-2 Fine-tuned on SQuAD
This model is a GPT-2 variant fine-tuned on the Stanford Question Answering Dataset (SQuAD) for question-answering tasks. It has been trained to understand context and generate relevant answers to questions based on provided information.
Model Description
- Base Model: GPT-2 (124M parameters)
- Training Data: Stanford Question Answering Dataset (SQuAD)
- Task: Question Answering
- Framework: PyTorch with Hugging Face Transformers
Training Details
The model was fine-tuned using:
- Mixed precision training (bfloat16)
- Learning rate: 2e-5
- Batch size: 16
- Gradient accumulation steps: 8
- Warmup steps: 1000
- Weight decay: 0.1
Usage
from transformers import GPT2LMHeadModel, GPT2Tokenizer
# Load model and tokenizer
model = GPT2LMHeadModel.from_pretrained("houcine-bdk/chatMachine_v1")
tokenizer = GPT2Tokenizer.from_pretrained("gpt2")
tokenizer.pad_token = tokenizer.eos_token
# Format your input
context = "Paris is the capital and largest city of France."
question = "What is the capital of France?"
input_text = f"Context: {context} Question: {question} Answer:"
# Generate answer
inputs = tokenizer(input_text, return_tensors="pt", padding=True)
outputs = model.generate(
**inputs,
max_new_tokens=50,
temperature=0.3,
do_sample=True,
top_p=0.9,
num_beams=4,
early_stopping=True,
pad_token_id=tokenizer.pad_token_id,
eos_token_id=tokenizer.eos_token_id,
)
# Extract answer
answer = tokenizer.decode(outputs[0], skip_special_tokens=True).split("Answer:")[-1].strip()
print(f"Answer: {answer}")
Performance and Limitations
The model performs best with:
- Simple, focused questions
- Clear, concise context
- Factual questions (who, what, when, where)
Limitations:
- May struggle with complex, multi-part questions
- Performance depends on the clarity and relevance of the provided context
- Best suited for short, focused answers rather than lengthy explanations
Example Questions
test_cases = [
{
"context": "George Washington was the first president of the United States, serving from 1789 to 1797.",
"question": "Who was the first president of the United States?"
},
{
"context": "The brain uses approximately 20 percent of the body's total energy consumption.",
"question": "How much of the body's energy does the brain use?"
}
]
Expected outputs:
- "George Washington"
- "20 percent"
Training Infrastructure
The model was trained on an RTX 4090 GPU using:
- PyTorch with CUDA optimizations
- Mixed precision training (bfloat16)
- Gradient accumulation for effective batch size scaling
Citation
If you use this model, please cite:
@misc{chatmachine_v1,
author = {Houcine BDK},
title = {ChatMachine_v1: GPT-2 Fine-tuned on SQuAD},
year = {2024},
publisher = {Hugging Face},
journal = {Hugging Face Model Hub},
howpublished = {\url{https://huggingface.co/houcine-bdk/chatMachine_v1}}
}
License
This model is released under the MIT License.
- Downloads last month
- 19
Inference Providers
NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API:
The model has no library tag.