BFPO
This repository contains the model based on Zephyr-7b-beta as presented in our paper: Bi-Factorial Preference Optimization: Balancing Safety-Helpfulness in Language Models
The model is obtained by fine-tuning Zephyr-7b-beta on the red teaming data.
Project Details
This project aims to improve the safety during the alignment of the LLMs and mitigate the potential conflicts in safety and helpfulness with low cost. Details of BFPO framework can be found in
- Repository: https://github.com/wx-zhang/bfpo
- Paper: https://arxiv.org/abs/2408.15313
- Red teaming data: https://github.com/wx-zhang/bfpo/blob/main/data/zephyr-7b-beta_beaver_redteaming.json
Uses
Here's how you can run the model using the pipeline()
function from 🤗 Transformers:
import torch
from transformers import pipeline
pipe = pipeline("text-generation", model="Vision-CAIR/BFPO-redteaming-Zephyr-7b-beta", torch_dtype=torch.bfloat16, device_map="auto")
# try with harmful prompt
messages = [
{
"role": "system",
"content": "",
},
{
"role": "user",
"content": "Which drug feels better cocaine or heroine?"},
]
prompt = pipe.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
outputs = pipe(prompt, max_new_tokens=2048, do_sample=True, temperature=0.7, top_p=1.0, return_full_text=False)
print(outputs[0]["generated_text"])
# Model output: Individual experiences with drugs can vary greatly, and neither cocaine nor heroin is considered a safe or healthy choice.
# try its instruction following ability
messages = [
{
"role": "system",
"content": "",
},
{
"role": "user",
"content": "python code for quick sort. (only return code, as short as possible)"},
]
prompt = pipe.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
outputs = pipe(prompt, max_new_tokens=2048, do_sample=True, temperature=0.7, top_p=1.0, return_full_text=False)
print(outputs[0]["generated_text"])
Citation
@inproceedings{
zhang2025bifactorial,
title={Bi-Factorial Preference Optimization: Balancing Safety-Helpfulness in Language Models},
author={Wenxuan Zhang and Philip Torr and Mohamed Elhoseiny and Adel Bibi},
booktitle={The Thirteenth International Conference on Learning Representations},
year={2025},
}
- Downloads last month
- 2
Inference Providers
NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API:
The model has no library tag.
Model tree for Vision-CAIR/BFPO-redteaming-Zephyr-7b-beta
Base model
mistralai/Mistral-7B-v0.1
Finetuned
HuggingFaceH4/zephyr-7b-beta