BGE-reranking

BGE-Renranker-Large

This is an int8 converted version of bge-reranker-large. Thanks to c2translate this should be at least 3 times faster than the original hf transformer version while its smaller with minimal performance loss.

Model Details

Different from embedding model bge-large-en-v1.5, reranker uses question and document as input and directly output similarity instead of embedding. You can get a relevance score by inputting query and passage to the reranker. The reranker is optimized based cross-entropy loss, so the relevance score is not bounded to a specific range. Besides this is highly optimized version using c2translate library suitable for production environments.

Model Sources

The original model is based on BAAI BGE-Reranker model. Please visit bge-reranker-orignal-repo for more details.

Usage

Simply pip install ctranslate2 and then

import ctranslate2
import transformers
import torch

device_mapping="cuda" if torch.cuda.is_available() else "cpu"

model_dir = "hooman650/ct2fast-bge-reranker"

# ctranslate2 encoder heavy lifting
encoder = ctranslate2.Encoder(model_dir, device = device_mapping)

# the classification head comes from HF
model_name = "BAAI/bge-reranker-large"
tokenizer = transformers.AutoTokenizer.from_pretrained(model_name)
classifier = transformers.AutoModelForSequenceClassification.from_pretrained(model_name).classifier

classifier.eval()
classifier.to(device_mapping)

pairs = [
    ["I like Ctranslate2","Ctranslate2 makes mid range models faster"],
    ["I like Ctranslate2","Using naive transformers might not be suitable for deployment"]
]
with torch.no_grad():
    tokens = tokenizer(pairs, padding=True, truncation=True, max_length=512).input_ids
    output = encoder.forward_batch(tokens)
    hidden_state = torch.as_tensor(output.last_hidden_state, device=device_mapping)
    logits = classifier(hidden_state).squeeze()

print(logits)

# tensor([ 1.0474, -9.4694], device='cuda:0')

Hardware

Supports both GPU and CPU.

Downloads last month
13
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no pipeline_tag.