chtmp223/ProLong-512k-8B-CLIPPER

ProLong-512k-8B-CLIPPER is a fine-tuned version of princeton-nlp/Llama-3-8B-ProLong-512k-Instruct using supervised finetuning over chtmp223/CLIPPER dataset. Please check our paper for more details on the method.

📒 Model Details

Model Description

Language(s) (NLP): English
License: Apache-2.0
Finetuned from model: princeton-nlp/Llama-3-8B-ProLong-512k-Instruct](https://huggingface.co/princeton-nlp/Llama-3-8B-ProLong-512k-Instruct)

Model Sources

Repository: Github repository.
Paper: https://arxiv.org/abs/2502.14854

💻 Training Details

Training Data

chtmp223/CLIPPER

Training Procedure

Configurations	Values
Hardware (Training and Inference)	8xA100s
Tracking	wandb
batch size	16
gradient_checkpointing	True
learning_rate	1.0e-6
lr_scheduler_type	cosine
max_length	131072
num_train_epochs	1
optim	adamw_torch

Software

Training code is adapted from https://github.com/princeton-nlp/ProLong.

🤗 Inference

Inference is done with vLLM on 1 A100-80GB.

📜 Citation

@misc{pham2025clippercompressionenableslongcontext,
      title={CLIPPER: Compression enables long-context synthetic data generation}, 
      author={Chau Minh Pham and Yapei Chang and Mohit Iyyer},
      year={2025},
      eprint={2502.14854},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2502.14854}, 
}

Model tree for chtmp223/ProLong-512k-8B-CLIPPER

chtmp223
/

ProLong-512k-8B-CLIPPER

ProLong-512k-8B-CLIPPER