chtmp223's picture
Create README.md
05dcf8b verified
metadata
base_model:
  - princeton-nlp/Llama-3-8B-ProLong-512k-Instruct
license: apache-2.0
language:
  - en
datasets:
  - chtmp223/CLIPPER

ProLong-512k-8B-CLIPPER

ProLong-512k-8B-CLIPPER is a fine-tuned version of princeton-nlp/Llama-3-8B-ProLong-512k-Instruct using supervised finetuning over chtmp223/CLIPPER dataset. Please check our paper for more details on the method.

πŸ“’ Model Details

Model Description

Model Sources

πŸ’» Training Details

Training Data

chtmp223/CLIPPER

Training Procedure

Configurations Values
Hardware (Training and Inference) 8xA100s
Tracking wandb
batch size 16
gradient_checkpointing True
learning_rate 1.0e-6
lr_scheduler_type cosine
max_length 131072
num_train_epochs 1
optim adamw_torch

Software

Training code is adapted from https://github.com/princeton-nlp/ProLong.

πŸ€— Inference

Inference is done with vLLM on 1 A100-80GB.

πŸ“œ Citation

@misc{pham2025clippercompressionenableslongcontext,
      title={CLIPPER: Compression enables long-context synthetic data generation}, 
      author={Chau Minh Pham and Yapei Chang and Mohit Iyyer},
      year={2025},
      eprint={2502.14854},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2502.14854}, 
}