BoolQ T5
This repository contains a T5-base model fine-tuned on the BoolQ dataset for generating true/false question-answer pairs. Leveraging T5’s text-to-text framework, the model can generate natural language questions and their corresponding yes/no answers directly from a given passage.
Model Overview
Built with PyTorch Lightning, this implementation streamlines training, validation, and hyperparameter tuning. By adapting the pre-trained T5-base model to the task of question generation and answer prediction, it effectively bridges comprehension and generation in a single framework.
Data Processing
Input Construction
Each input sample is formatted as follows:
truefalse: [answer] passage: [passage] </s>
Target Construction
Each target sample is formatted as:
question: [question] answer: [yes/no] </s>
The boolean answer is normalized to “yes” or “no” to ensure consistency during training.
Training Details
- Framework: PyTorch Lightning
- Optimizer: AdamW with linear learning rate scheduling and warmup
- Batch Sizes:
- Training: 6
- Evaluation: 6
- Maximum Sequence Length: 256 tokens
- Number of Training Epochs: 4
Evaluation Metrics
The model’s performance was evaluated using BLEU scores for both the generated questions and answers. For question generation, the metrics are as follows:
Metric | Question |
---|---|
BLEU-1 | 0.5143 |
BLEU-2 | 0.3950 |
BLEU-3 | 0.3089 |
BLEU-4 | 0.2431 |
Note: These metrics offer a quantitative assessment of the model’s quality in generating coherent and relevant question-answer pairs.
How to Use
You can easily utilize this model for inference using the Hugging Face Transformers pipeline:
from transformers import pipeline
generator = pipeline("text2text-generation", model="Fares7elsadek/boolq-t5-base-question-generation")
# Example inference:
input_text = "truefalse: [answer] passage: [Your passage here] </s>"
result = generator(input_text)
print(result)
- Downloads last month
- 153
Model tree for fares7elsadek/boolq-t5-base-question-generation
Base model
google-t5/t5-base