CrossEncoder based on microsoft/MiniLM-L12-H384-uncased

This is a Cross Encoder model finetuned from microsoft/MiniLM-L12-H384-uncased using the sentence-transformers library. It computes scores for pairs of texts, which can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Sources

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import CrossEncoder

# Download from the 🤗 Hub
model = CrossEncoder("sentence_transformers_model_id")
# Get scores for pairs of texts
pairs = [
    ['enrollment statistics at southern arkansas university', 'The University of Southern Malawi also known as the Malawi University of Science and Technology(MUST) [edit]. The Malawi University of Science and Technology was established on 17th December 2012 by the Malawi University of Science and Technology Act No. 31 of 2012 as the fourth Public University in Malawi.'],
    ['burgos is in what province spain', 'The province of Burgos is a province of northern Spain, in the northeastern part of the autonomous community of Castile and Leon. León it is bordered by the provinces Of, Palencia, Cantabria, Ã\x81lava, Alava álava, La, Rioja, soria Segovia. And valladolid its capital is the City. of burgoshe province of Burgos is divided into 371 municipalities, being the Spanish province with the highest number, although many of them have fewer than 100 inhabitants.'],
    ['most important customer service skills', 'Customer Service Skill #1: Empathy. Empathy gets thrown around a lot in support training, and for good reason: it might be the single most important customer service skill to develop. To help your customers be happy and successful, itâ\x80\x99s important to understand what happiness and success mean to them.'],
    ['what happens if we eat too many carbohydrates', 'What Happens If You Eat Too Many Carbs? We all know the feeling you get after eating a large bowl of pasta. Your stomach swells up and you feel like you just gained 10 pounds. Surprisingly carbohydrates are a very important fuel source for your body. Without them it would be hard to have any energy throughout the day. Even though there are risks to consuming no carbs at all, there are also risks to consuming too much! See the article below where we talk about what could happen if you eat too many carbs. You Will Gain Body Fat Sorry to say this but â\x80\x9cyesâ\x80\x9d if you consume too many carbs than you will gain body fat. This isnâ\x80\x99t all that bad though when it comes to building muscle that is. You need to be eating lots of calories throughout the day in order to spark muscle growth. Carbohydrates just happen to have a lot of calories in them.'],
    ['what county is wharton nj in', 'Sponsored Topics. Wharton is a Borough in Morris County, New Jersey, United States. As of the 2000 United States Census, the borough population was 6,298.'],
]
scores = model.predict(pairs)
print(scores.shape)
# (5,)

# Or rank different texts based on similarity to a single text
ranks = model.rank(
    'enrollment statistics at southern arkansas university',
    [
        'The University of Southern Malawi also known as the Malawi University of Science and Technology(MUST) [edit]. The Malawi University of Science and Technology was established on 17th December 2012 by the Malawi University of Science and Technology Act No. 31 of 2012 as the fourth Public University in Malawi.',
        'The province of Burgos is a province of northern Spain, in the northeastern part of the autonomous community of Castile and Leon. León it is bordered by the provinces Of, Palencia, Cantabria, Ã\x81lava, Alava álava, La, Rioja, soria Segovia. And valladolid its capital is the City. of burgoshe province of Burgos is divided into 371 municipalities, being the Spanish province with the highest number, although many of them have fewer than 100 inhabitants.',
        'Customer Service Skill #1: Empathy. Empathy gets thrown around a lot in support training, and for good reason: it might be the single most important customer service skill to develop. To help your customers be happy and successful, itâ\x80\x99s important to understand what happiness and success mean to them.',
        'What Happens If You Eat Too Many Carbs? We all know the feeling you get after eating a large bowl of pasta. Your stomach swells up and you feel like you just gained 10 pounds. Surprisingly carbohydrates are a very important fuel source for your body. Without them it would be hard to have any energy throughout the day. Even though there are risks to consuming no carbs at all, there are also risks to consuming too much! See the article below where we talk about what could happen if you eat too many carbs. You Will Gain Body Fat Sorry to say this but â\x80\x9cyesâ\x80\x9d if you consume too many carbs than you will gain body fat. This isnâ\x80\x99t all that bad though when it comes to building muscle that is. You need to be eating lots of calories throughout the day in order to spark muscle growth. Carbohydrates just happen to have a lot of calories in them.',
        'Sponsored Topics. Wharton is a Borough in Morris County, New Jersey, United States. As of the 2000 United States Census, the borough population was 6,298.',
    ]
)
# [{'corpus_id': ..., 'score': ...}, {'corpus_id': ..., 'score': ...}, ...]

Evaluation

Metrics

Cross Encoder Reranking

Metric train-eval NanoMSMARCO NanoNFCorpus NanoNQ
map 0.6582 0.6058 (+0.1162) 0.3384 (+0.0680) 0.6984 (+0.2778)
mrr@10 0.6556 0.5982 (+0.1207) 0.5367 (+0.0368) 0.7111 (+0.2844)
ndcg@10 0.7121 0.6699 (+0.1294) 0.3760 (+0.0510) 0.7469 (+0.2462)

Cross Encoder Nano BEIR

Metric Value
map 0.5476 (+0.1540)
mrr@10 0.6153 (+0.1473)
ndcg@10 0.5976 (+0.1422)

Training Details

Training Dataset

Unnamed Dataset

  • Size: 2,000,000 training samples
  • Columns: sentence_0, sentence_1, and label
  • Approximate statistics based on the first 1000 samples:
    sentence_0 sentence_1 label
    type string string int
    details
    • min: 10 characters
    • mean: 33.9 characters
    • max: 104 characters
    • min: 64 characters
    • mean: 343.08 characters
    • max: 991 characters
    • 0: ~81.00%
    • 1: ~19.00%
  • Samples:
    sentence_0 sentence_1 label
    enrollment statistics at southern arkansas university The University of Southern Malawi also known as the Malawi University of Science and Technology(MUST) [edit]. The Malawi University of Science and Technology was established on 17th December 2012 by the Malawi University of Science and Technology Act No. 31 of 2012 as the fourth Public University in Malawi. 0
    burgos is in what province spain The province of Burgos is a province of northern Spain, in the northeastern part of the autonomous community of Castile and Leon. León it is bordered by the provinces Of, Palencia, Cantabria, Álava, Alava álava, La, Rioja, soria Segovia. And valladolid its capital is the City. of burgoshe province of Burgos is divided into 371 municipalities, being the Spanish province with the highest number, although many of them have fewer than 100 inhabitants. 1
    most important customer service skills Customer Service Skill #1: Empathy. Empathy gets thrown around a lot in support training, and for good reason: it might be the single most important customer service skill to develop. To help your customers be happy and successful, it’s important to understand what happiness and success mean to them. 1
  • Loss: FitMixinLoss

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 64
  • per_device_eval_batch_size: 64
  • num_train_epochs: 1
  • fp16: True

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 64
  • per_device_eval_batch_size: 64
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1
  • num_train_epochs: 1
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.0
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss train-eval_ndcg@10 NanoMSMARCO_ndcg@10 NanoNFCorpus_ndcg@10 NanoNQ_ndcg@10 NanoBEIR_mean_ndcg@10
-1 -1 - 0.0488 0.0971 (-0.4433) 0.2449 (-0.0802) 0.0508 (-0.4498) 0.1310 (-0.3244)
0.016 500 1.1004 - - - - -
0.032 1000 0.7746 - - - - -
0.048 1500 0.543 - - - - -
0.064 2000 0.4508 - - - - -
0.08 2500 0.4112 - - - - -
0.096 3000 0.3949 - - - - -
0.112 3500 0.3793 - - - - -
0.128 4000 0.3584 - - - - -
0.144 4500 0.3725 - - - - -
0.16 5000 0.358 0.6634 0.6343 (+0.0939) 0.3986 (+0.0735) 0.7085 (+0.2078) 0.5805 (+0.1251)
0.176 5500 0.3442 - - - - -
0.192 6000 0.3355 - - - - -
0.208 6500 0.3423 - - - - -
0.224 7000 0.3253 - - - - -
0.24 7500 0.3256 - - - - -
0.256 8000 0.3231 - - - - -
0.272 8500 0.3218 - - - - -
0.288 9000 0.3119 - - - - -
0.304 9500 0.3056 - - - - -
0.32 10000 0.3125 0.6861 0.6423 (+0.1019) 0.4197 (+0.0947) 0.7333 (+0.2327) 0.5985 (+0.1431)
0.336 10500 0.3 - - - - -
0.352 11000 0.305 - - - - -
0.368 11500 0.3088 - - - - -
0.384 12000 0.2963 - - - - -
0.4 12500 0.3068 - - - - -
0.416 13000 0.299 - - - - -
0.432 13500 0.2962 - - - - -
0.448 14000 0.2942 - - - - -
0.464 14500 0.2969 - - - - -
0.48 15000 0.2956 0.6964 0.6397 (+0.0993) 0.3773 (+0.0523) 0.7140 (+0.2134) 0.5770 (+0.1216)
0.496 15500 0.2928 - - - - -
0.512 16000 0.2829 - - - - -
0.528 16500 0.2794 - - - - -
0.544 17000 0.2818 - - - - -
0.56 17500 0.2843 - - - - -
0.576 18000 0.2858 - - - - -
0.592 18500 0.2801 - - - - -
0.608 19000 0.2902 - - - - -
0.624 19500 0.2768 - - - - -
0.64 20000 0.2768 0.6963 0.6456 (+0.1052) 0.3820 (+0.0570) 0.7230 (+0.2224) 0.5835 (+0.1282)
0.656 20500 0.2744 - - - - -
0.672 21000 0.2753 - - - - -
0.688 21500 0.2632 - - - - -
0.704 22000 0.2818 - - - - -
0.72 22500 0.2668 - - - - -
0.736 23000 0.2673 - - - - -
0.752 23500 0.2663 - - - - -
0.768 24000 0.2612 - - - - -
0.784 24500 0.2655 - - - - -
0.8 25000 0.2592 0.7070 0.6614 (+0.1210) 0.3803 (+0.0552) 0.7482 (+0.2476) 0.5966 (+0.1412)
0.816 25500 0.2661 - - - - -
0.832 26000 0.2568 - - - - -
0.848 26500 0.2651 - - - - -
0.864 27000 0.2577 - - - - -
0.88 27500 0.2579 - - - - -
0.896 28000 0.2552 - - - - -
0.912 28500 0.2531 - - - - -
0.928 29000 0.255 - - - - -
0.944 29500 0.2565 - - - - -
0.96 30000 0.2534 0.7150 0.6647 (+0.1243) 0.3745 (+0.0495) 0.7479 (+0.2472) 0.5957 (+0.1403)
0.976 30500 0.2508 - - - - -
0.992 31000 0.2459 - - - - -
1.0 31250 - 0.7121 0.6699 (+0.1294) 0.3760 (+0.0510) 0.7469 (+0.2462) 0.5976 (+0.1422)

Environmental Impact

Carbon emissions were measured using CodeCarbon.

  • Energy Consumed: 0.457 kWh
  • Carbon Emitted: 0.178 kg of CO2
  • Hours Used: 1.209 hours

Training Hardware

  • On Cloud: No
  • GPU Model: 1 x NVIDIA GeForce RTX 3090
  • CPU Model: 13th Gen Intel(R) Core(TM) i7-13700K
  • RAM Size: 31.78 GB

Framework Versions

  • Python: 3.11.6
  • Sentence Transformers: 3.5.0.dev0
  • Transformers: 4.48.3
  • PyTorch: 2.5.0+cu121
  • Accelerate: 1.3.0
  • Datasets: 2.20.0
  • Tokenizers: 0.21.0

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}
Downloads last month
15
Safetensors
Model size
33.4M params
Tensor type
F32
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The HF Inference API does not support text-classification models for sentence-transformers library.

Model tree for tomaarsen/reranker-MiniLM-L12-msmarco-scratch-pos_weight-4

Finetuned
(41)
this model

Evaluation results