CrossEncoder based on bansalaman18/bert-uncased_L-4_H-512_A-8

This is a Cross Encoder model finetuned from bansalaman18/bert-uncased_L-4_H-512_A-8 on the msmarco dataset using the sentence-transformers library. It computes scores for pairs of texts, which can be used for text reranking and semantic search.

Model Details

Model Description

Model Sources

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import CrossEncoder

# Download from the 🤗 Hub
model = CrossEncoder("bansalaman18/reranker-bert-uncased_L-4_H-512_A-8-msmarco-bce")
# Get scores for pairs of texts
pairs = [
    ['who is wale the rapper', "Wale (rapper)'s wiki: Olubowale Victor Akintimehin (born September 21, 1984), better known by his stage name Wale (/Ë\x88wÉ\x94Ë\x90leɪ/ WAW-lay), is an American rapper from Washington, D.C. He rose to prominence in 2006, when his song Dig Dug (Shake It) became popular in his hometown. Wale became locally recognized and continued recording music for the regional audience."],
    ['what is platinum used for', 'The periodic table is a chart that shows how elements are related to one another. Indium is a transition metal that is also part of the platinum family. The metals in the platinum family are also known as the noble metals. They have this name because they do not react well with other elements and compounds. They appear to be too superior to react with most other substances. In fact, iridium is the most corrosion-resistant metal known.'],
    ['where is the gonzaga university located', 'Where We Are. The Boise State University Department of Public Safety substation is located in Capitol Village at 2245 University Drive. Our office is open 24 hours a day, 7 days a week, so we are always available. Physical Address: 2245 University Drive, Boise, Idaho 83706.'],
    ['most common protein in the human body', 'Protein is the second category of food that a human body can get energy from. Most people know protein as animal meat-a hamburger, a chicken leg. These are all proteins. You can also extract protein from certain plants.Soy protein isolate is a well known protein that comes from soybeans.uman Body Four Energy Sources. The human body can only metabolize four types of energy sources. These four categories are carbohydrates (sugars and starches), fats (includes oils), proteins (animal and vegetable), and alcohol.'],
    ['where is azilda ontario', 'Azilda railway station is a Via Rail flag stop station located in Azilda, Ontario, in the city of Greater Sudbury community of Rayside-Balfour. It is on the Canadian Pacific Railway transcontinental main line, and is served by the regional rail Sudbury â\x80\x93 White River train. Map 12 (PDF) (Map). 1 : 1,600,000.'],
]
scores = model.predict(pairs)
print(scores.shape)
# (5,)

# Or rank different texts based on similarity to a single text
ranks = model.rank(
    'who is wale the rapper',
    [
        "Wale (rapper)'s wiki: Olubowale Victor Akintimehin (born September 21, 1984), better known by his stage name Wale (/Ë\x88wÉ\x94Ë\x90leɪ/ WAW-lay), is an American rapper from Washington, D.C. He rose to prominence in 2006, when his song Dig Dug (Shake It) became popular in his hometown. Wale became locally recognized and continued recording music for the regional audience.",
        'The periodic table is a chart that shows how elements are related to one another. Indium is a transition metal that is also part of the platinum family. The metals in the platinum family are also known as the noble metals. They have this name because they do not react well with other elements and compounds. They appear to be too superior to react with most other substances. In fact, iridium is the most corrosion-resistant metal known.',
        'Where We Are. The Boise State University Department of Public Safety substation is located in Capitol Village at 2245 University Drive. Our office is open 24 hours a day, 7 days a week, so we are always available. Physical Address: 2245 University Drive, Boise, Idaho 83706.',
        'Protein is the second category of food that a human body can get energy from. Most people know protein as animal meat-a hamburger, a chicken leg. These are all proteins. You can also extract protein from certain plants.Soy protein isolate is a well known protein that comes from soybeans.uman Body Four Energy Sources. The human body can only metabolize four types of energy sources. These four categories are carbohydrates (sugars and starches), fats (includes oils), proteins (animal and vegetable), and alcohol.',
        'Azilda railway station is a Via Rail flag stop station located in Azilda, Ontario, in the city of Greater Sudbury community of Rayside-Balfour. It is on the Canadian Pacific Railway transcontinental main line, and is served by the regional rail Sudbury â\x80\x93 White River train. Map 12 (PDF) (Map). 1 : 1,600,000.',
    ]
)
# [{'corpus_id': ..., 'score': ...}, {'corpus_id': ..., 'score': ...}, ...]

Evaluation

Metrics

Cross Encoder Reranking

  • Datasets: NanoMSMARCO_R100, NanoNFCorpus_R100 and NanoNQ_R100
  • Evaluated with CrossEncoderRerankingEvaluator with these parameters:
    {
        "at_k": 10,
        "always_rerank_positives": true
    }
    
Metric NanoMSMARCO_R100 NanoNFCorpus_R100 NanoNQ_R100
map 0.0664 (-0.4232) 0.3041 (+0.0431) 0.1094 (-0.3102)
mrr@10 0.0383 (-0.4392) 0.4851 (-0.0148) 0.0819 (-0.3448)
ndcg@10 0.0484 (-0.4921) 0.3186 (-0.0064) 0.1066 (-0.3940)

Cross Encoder Nano BEIR

  • Dataset: NanoBEIR_R100_mean
  • Evaluated with CrossEncoderNanoBEIREvaluator with these parameters:
    {
        "dataset_names": [
            "msmarco",
            "nfcorpus",
            "nq"
        ],
        "rerank_k": 100,
        "at_k": 10,
        "always_rerank_positives": true
    }
    
Metric Value
map 0.1600 (-0.2301)
mrr@10 0.2018 (-0.2662)
ndcg@10 0.1579 (-0.2975)

Training Details

Training Dataset

msmarco

  • Dataset: msmarco at 9e329ed
  • Size: 90,000 training samples
  • Columns: query, passage, and score
  • Approximate statistics based on the first 1000 samples:
    query passage score
    type string string float
    details
    • min: 10 characters
    • mean: 34.26 characters
    • max: 168 characters
    • min: 60 characters
    • mean: 343.34 characters
    • max: 984 characters
    • min: 0.0
    • mean: 0.53
    • max: 1.0
  • Samples:
    query passage score
    who is the actor that plays the tanned colonel for kfc? James Rebhorn Actor, Scent of a Woman James Robert Rebhorn (September 1, 1948 - March 21, 2014) was an American actor who appeared in over 100 films, television series, and plays. At the time of his death, he had recurring roles in the current series White Collar and Homeland. 0.0
    asking for an increase in credit limit harm your credit score If you request a credit line increase, you should ask the lender whether it will result in your credit report being pulled. If it does, this will show up as an inquiry on your report and generally remains on your credit report for two years.Too many inquiries can lower your credit score.f you request a credit line increase, you should ask the lender whether it will result in your credit report being pulled. If it does, this will show up as an inquiry on your report and generally remains on your credit report for two years. 1.0
    what is a sheep ked Cysteine is required by sheep to produce wool: It is an essential amino acid that must be taken in from their feed. As a consequence, during drought conditions, sheep produce less wool; however, transgenic sheep that can make their own cysteine have been developed. 0.0
  • Loss: BinaryCrossEntropyLoss with these parameters:
    {
        "activation_fn": "torch.nn.modules.linear.Identity",
        "pos_weight": null
    }
    

Evaluation Dataset

msmarco

  • Dataset: msmarco at 9e329ed
  • Size: 10,000 evaluation samples
  • Columns: query, passage, and score
  • Approximate statistics based on the first 1000 samples:
    query passage score
    type string string float
    details
    • min: 11 characters
    • mean: 34.79 characters
    • max: 118 characters
    • min: 86 characters
    • mean: 353.31 characters
    • max: 970 characters
    • min: 0.0
    • mean: 0.52
    • max: 1.0
  • Samples:
    query passage score
    who is wale the rapper Wale (rapper)'s wiki: Olubowale Victor Akintimehin (born September 21, 1984), better known by his stage name Wale (/ˈwɔːleɪ/ WAW-lay), is an American rapper from Washington, D.C. He rose to prominence in 2006, when his song Dig Dug (Shake It) became popular in his hometown. Wale became locally recognized and continued recording music for the regional audience. 1.0
    what is platinum used for The periodic table is a chart that shows how elements are related to one another. Indium is a transition metal that is also part of the platinum family. The metals in the platinum family are also known as the noble metals. They have this name because they do not react well with other elements and compounds. They appear to be too superior to react with most other substances. In fact, iridium is the most corrosion-resistant metal known. 0.0
    where is the gonzaga university located Where We Are. The Boise State University Department of Public Safety substation is located in Capitol Village at 2245 University Drive. Our office is open 24 hours a day, 7 days a week, so we are always available. Physical Address: 2245 University Drive, Boise, Idaho 83706. 0.0
  • Loss: BinaryCrossEntropyLoss with these parameters:
    {
        "activation_fn": "torch.nn.modules.linear.Identity",
        "pos_weight": null
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • learning_rate: 2e-05
  • num_train_epochs: 1
  • warmup_ratio: 0.1
  • seed: 12
  • bf16: True
  • half_precision_backend: cpu_amp
  • dataloader_num_workers: 4
  • load_best_model_at_end: True

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 2e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 1
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 12
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: True
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: cpu_amp
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 4
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: True
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • tp_size: 0
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: proportional
  • router_mapping: {}
  • learning_rate_mapping: {}

Training Logs

Epoch Step Training Loss Validation Loss NanoMSMARCO_R100_ndcg@10 NanoNFCorpus_R100_ndcg@10 NanoNQ_R100_ndcg@10 NanoBEIR_R100_mean_ndcg@10
-1 -1 - - 0.0441 (-0.4963) 0.3050 (-0.0201) 0.0582 (-0.4424) 0.1357 (-0.3196)
0.0002 1 0.7036 - - - - -
0.1778 1000 0.6957 0.6984 0.0411 (-0.4994) 0.2691 (-0.0560) 0.0130 (-0.4876) 0.1077 (-0.3476)
0.3556 2000 0.6901 0.7008 0.0452 (-0.4952) 0.3050 (-0.0200) 0.0937 (-0.4069) 0.1480 (-0.3074)
0.5333 3000 0.678 0.6776 0.0488 (-0.4916) 0.3064 (-0.0186) 0.1108 (-0.3899) 0.1553 (-0.3000)
0.7111 4000 0.6724 0.6617 0.0397 (-0.5007) 0.3169 (-0.0081) 0.1040 (-0.3966) 0.1536 (-0.3018)
0.8889 5000 0.6706 0.6583 0.0484 (-0.4921) 0.3186 (-0.0064) 0.1066 (-0.3940) 0.1579 (-0.2975)
-1 -1 - - 0.0484 (-0.4921) 0.3186 (-0.0064) 0.1066 (-0.3940) 0.1579 (-0.2975)
  • The bold row denotes the saved checkpoint.

Framework Versions

  • Python: 3.11.13
  • Sentence Transformers: 5.0.0
  • Transformers: 4.51.0
  • PyTorch: 2.6.0
  • Accelerate: 1.8.1
  • Datasets: 3.6.0
  • Tokenizers: 0.21.4-dev.0

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}
Downloads last month
8
Safetensors
Model size
28.8M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for bansalaman18/reranker-bert-uncased_L-4_H-512_A-8-msmarco-bce

Finetuned
(2)
this model

Dataset used to train bansalaman18/reranker-bert-uncased_L-4_H-512_A-8-msmarco-bce

Evaluation results