YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

Quantization made by Richard Erkhov.

Github

Discord

Request more models

Qwen2-0.5B-XPO - EXL2

Available sizes

Branch Bits Description
8_0 8.0 Maximum quality that ExLlamaV2 can produce, near unquantized performance.
6_5 6.5 Very similar to 8.0, good tradeoff of size vs performance, recommended.
5_0 5.0 Slightly lower quality vs 6.5, but usable
4_25 4.25 GPTQ equivalent bits per weight, slightly higher quality.
3_5 3.5 Lower quality, only use if you have to.

Download instructions

With git:

git clone --single-branch --branch 6_5 https://huggingface.co/trl-lib_-_Qwen2-0.5B-XPO-exl2 Qwen2-0.5B-XPO-6_5

With huggingface hub:

pip3 install huggingface-hub

To download a specific branch, use the --revision parameter. For example, to download the 6.5 bpw branch: Linux:

huggingface-cli download trl-lib_-_Qwen2-0.5B-XPO-exl2 --revision 6_5 --local-dir Qwen2-0.5B-XPO-6_5 --local-dir-use-symlinks False

Windows (which apparently doesn't like _ in folders sometimes?):

huggingface-cli download trl-lib_-_Qwen2-0.5B-XPO-exl2 --revision 6_5 --local-dir Qwen2-0.5B-XPO-6.5 --local-dir-use-symlinks False

Original model description:

base_model: Qwen/Qwen2-0.5B-Instruct library_name: transformers model_name: Qwen2-0.5B-XPO tags: - generated_from_trainer - trl - xpo licence: license

Model Card for Qwen2-0.5B-XPO

This model is a fine-tuned version of Qwen/Qwen2-0.5B-Instruct. It has been trained using TRL.

Quick start

from transformers import pipeline

question = "If you had a time machine, but could only go to the past or the future once and never return, which would you choose and why?"
generator = pipeline("text-generation", model="qgallouedec/Qwen2-0.5B-XPO", device="cuda")
output = generator([{"role": "user", "content": question}], max_new_tokens=128, return_full_text=False)[0]
print(output["generated_text"])

Training procedure

Visualize in Weights & Biases

This model was trained with XPO, a method introduced in Exploratory Preference Optimization: Harnessing Implicit Q*-Approximation for Sample-Efficient RLHF.

Framework versions

  • TRL: 0.12.0.dev0
  • Transformers: 4.46.0.dev0
  • Pytorch: 2.4.1
  • Datasets: 3.0.1
  • Tokenizers: 0.20.0

Citations

Cite XPO as:

@article{jung2024binary,
    title        = {{Exploratory Preference Optimization: Harnessing Implicit Q*-Approximation for Sample-Efficient RLHF}},
    author       = {Tengyang Xie and Dylan J. Foster and Akshay Krishnamurthy and Corby Rosset and Ahmed Awadallah and Alexander Rakhlin},
    year         = 2024,
    eprint       = {arXiv:2405.21046}
}

Cite TRL as:

@misc{vonwerra2022trl,
    title        = {{TRL: Transformer Reinforcement Learning}},
    author       = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallouédec},
    year         = 2020,
    journal      = {GitHub repository},
    publisher    = {GitHub},
    howpublished = {\url{https://github.com/huggingface/trl}}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.