checkpoints

This model is a fine-tuned version of google/pegasus-large on the booksum dataset.

Model description

More information needed

Intended uses & limitations

  • standard pegasus has a max input length of 1024 tokens, therefore the model only saw the first 1024 tokens of a chapter when training, and learned to try to make the chapter's summary from that. Keep this in mind when using this model, as information at the end of a text sequence longer than 1024 tokens may be excluded from the final summary/the model will be biased towards information presented first.
  • this was only trained on the dataset for an epoch but still provides reasonable results.

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.001
  • train_batch_size: 16
  • eval_batch_size: 16
  • seed: 42
  • distributed_type: multi-GPU
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 256
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine_with_restarts
  • lr_scheduler_warmup_ratio: 0.03
  • num_epochs: 1

Training results

Framework versions

  • Transformers 4.16.1
  • Pytorch 1.10.0+cu111
  • Datasets 1.18.2
  • Tokenizers 0.10.3
Downloads last month
119
Safetensors
Model size
569M params
Tensor type
F32
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.

Dataset used to train pszemraj/pegasus-large-book-summary