ProLong-512k-8B-CLIPPER

ProLong-512k-8B-CLIPPER is a fine-tuned version of princeton-nlp/Llama-3-8B-ProLong-512k-Instruct using supervised finetuning over chtmp223/CLIPPER dataset. Please check our paper for more details on the method.

πŸ“’ Model Details

Model Description

Model Sources

πŸ’» Training Details

Training Data

chtmp223/CLIPPER

Training Procedure

Configurations Values
Hardware (Training and Inference) 8xA100s
Tracking wandb
batch size 16
gradient_checkpointing True
learning_rate 1.0e-6
lr_scheduler_type cosine
max_length 131072
num_train_epochs 1
optim adamw_torch

Software

Training code is adapted from https://github.com/princeton-nlp/ProLong.

πŸ€— Inference

Inference is done with vLLM on 1 A100-80GB.

πŸ“œ Citation

@misc{pham2025clippercompressionenableslongcontext,
      title={CLIPPER: Compression enables long-context synthetic data generation}, 
      author={Chau Minh Pham and Yapei Chang and Mohit Iyyer},
      year={2025},
      eprint={2502.14854},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2502.14854}, 
}
Downloads last month
6
Safetensors
Model size
8.03B params
Tensor type
F32
Β·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.

Model tree for chtmp223/ProLong-512k-8B-CLIPPER

Dataset used to train chtmp223/ProLong-512k-8B-CLIPPER

Collection including chtmp223/ProLong-512k-8B-CLIPPER