metadata
language:
- en
tags:
- sentence-transformers
- sentence-similarity
- feature-extraction
- generated_from_trainer
- dataset_size:942069
- loss:CoSENTLoss
base_model: microsoft/mpnet-base
widget:
- source_sentence: Three women in dress suits walk by a building.
sentences:
- Three women are traveling by foot.
- Two kids are flying on a hovercraft.
- A man jumps in the ocean.
- source_sentence: >-
A man wearing sunglasses is sitting on the steps outside, reading a
magazine.
sentences:
- Men are walking in different directions.
- There is a man running.
- The man is reading a spoon with the words "HELP ME" on it.
- source_sentence: >-
A middle-aged man is sitting indian style outside holding a folded paper
in his hands.
sentences:
- A man and woman are looking at produce.
- A middle aged man is showing off his origami creation.
- The boy is sitting
- source_sentence: >-
Two men playing baseball with the one in the black and red jersey running
toward base.
sentences:
- The person is cooking a hamburger.
- The man in black and red is sitting in the bleachers.
- A player fighting in a soccer game.
- source_sentence: Two men are in an electronics workshop, working on computers or equipment.
sentences:
- The men are experts when it comes to electronics.
- A tall person sitting
- A man is chasing an SUV that is going in the same direction as him.
datasets:
- sentence-transformers/all-nli
pipeline_tag: sentence-similarity
library_name: sentence-transformers
metrics:
- pearson_cosine
- spearman_cosine
model-index:
- name: SentenceTransformer based on microsoft/mpnet-base
results:
- task:
type: semantic-similarity
name: Semantic Similarity
dataset:
name: basemodel evaluator
type: basemodel_evaluator
metrics:
- type: pearson_cosine
value: 0.573397130206215
name: Pearson Cosine
- type: spearman_cosine
value: 0.5954429501396288
name: Spearman Cosine
- task:
type: semantic-similarity
name: Semantic Similarity
dataset:
name: finetunedmodel evaluator
type: finetunedmodel_evaluator
metrics:
- type: pearson_cosine
value: 0.5716395027559762
name: Pearson Cosine
- type: spearman_cosine
value: 0.6003777834660847
name: Spearman Cosine
- type: pearson_cosine
value: 0.5716395027559762
name: Pearson Cosine
- type: spearman_cosine
value: 0.6003777834660847
name: Spearman Cosine
SentenceTransformer based on microsoft/mpnet-base
This is a sentence-transformers model finetuned from microsoft/mpnet-base on the all-nli dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
Model Details
Model Description
- Model Type: Sentence Transformer
- Base model: microsoft/mpnet-base
- Maximum Sequence Length: 512 tokens
- Output Dimensionality: 768 dimensions
- Similarity Function: Cosine Similarity
- Training Dataset:
- Language: en
Model Sources
- Documentation: Sentence Transformers Documentation
- Repository: Sentence Transformers on GitHub
- Hugging Face: Sentence Transformers on Hugging Face
Full Model Architecture
SentenceTransformer(
(0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: MPNetModel
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)
Usage
Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("surajvbangera/mediclaim_embedding")
# Run inference
sentences = [
'Two men are in an electronics workshop, working on computers or equipment.',
'The men are experts when it comes to electronics.',
'A tall person sitting',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
Evaluation
Metrics
Semantic Similarity
- Datasets:
basemodel_evaluator
,finetunedmodel_evaluator
andfinetunedmodel_evaluator
- Evaluated with
EmbeddingSimilarityEvaluator
Metric | basemodel_evaluator | finetunedmodel_evaluator |
---|---|---|
pearson_cosine | 0.5734 | 0.5716 |
spearman_cosine | 0.5954 | 0.6004 |
Training Details
Training Dataset
all-nli
- Dataset: all-nli at d482672
- Size: 942,069 training samples
- Columns:
sentence1
,sentence2
, andscore
- Approximate statistics based on the first 1000 samples:
sentence1 sentence2 score type string string float details - min: 6 tokens
- mean: 17.38 tokens
- max: 52 tokens
- min: 4 tokens
- mean: 10.7 tokens
- max: 31 tokens
- min: 0.0
- mean: 0.5
- max: 1.0
- Samples:
sentence1 sentence2 score A person on a horse jumps over a broken down airplane.
A person is training his horse for a competition.
0.5
A person on a horse jumps over a broken down airplane.
A person is at a diner, ordering an omelette.
0.0
A person on a horse jumps over a broken down airplane.
A person is outdoors, on a horse.
1.0
- Loss:
CoSENTLoss
with these parameters:{ "scale": 20.0, "similarity_fct": "pairwise_cos_sim" }
Evaluation Dataset
all-nli
- Dataset: all-nli at d482672
- Size: 19,657 evaluation samples
- Columns:
sentence1
,sentence2
, andscore
- Approximate statistics based on the first 1000 samples:
sentence1 sentence2 score type string string float details - min: 6 tokens
- mean: 17.56 tokens
- max: 45 tokens
- min: 5 tokens
- mean: 10.51 tokens
- max: 25 tokens
- min: 0.0
- mean: 0.5
- max: 1.0
- Samples:
sentence1 sentence2 score Two women are embracing while holding to go packages.
The sisters are hugging goodbye while holding to go packages after just eating lunch.
0.5
Two women are embracing while holding to go packages.
Two woman are holding packages.
1.0
Two women are embracing while holding to go packages.
The men are fighting outside a deli.
0.0
- Loss:
CoSENTLoss
with these parameters:{ "scale": 20.0, "similarity_fct": "pairwise_cos_sim" }
Training Hyperparameters
Non-Default Hyperparameters
eval_strategy
: stepsper_device_train_batch_size
: 16per_device_eval_batch_size
: 16num_train_epochs
: 1warmup_ratio
: 0.1fp16
: Truebatch_sampler
: no_duplicates
All Hyperparameters
Click to expand
overwrite_output_dir
: Falsedo_predict
: Falseeval_strategy
: stepsprediction_loss_only
: Trueper_device_train_batch_size
: 16per_device_eval_batch_size
: 16per_gpu_train_batch_size
: Noneper_gpu_eval_batch_size
: Nonegradient_accumulation_steps
: 1eval_accumulation_steps
: Nonetorch_empty_cache_steps
: Nonelearning_rate
: 5e-05weight_decay
: 0.0adam_beta1
: 0.9adam_beta2
: 0.999adam_epsilon
: 1e-08max_grad_norm
: 1.0num_train_epochs
: 1max_steps
: -1lr_scheduler_type
: linearlr_scheduler_kwargs
: {}warmup_ratio
: 0.1warmup_steps
: 0log_level
: passivelog_level_replica
: warninglog_on_each_node
: Truelogging_nan_inf_filter
: Truesave_safetensors
: Truesave_on_each_node
: Falsesave_only_model
: Falserestore_callback_states_from_checkpoint
: Falseno_cuda
: Falseuse_cpu
: Falseuse_mps_device
: Falseseed
: 42data_seed
: Nonejit_mode_eval
: Falseuse_ipex
: Falsebf16
: Falsefp16
: Truefp16_opt_level
: O1half_precision_backend
: autobf16_full_eval
: Falsefp16_full_eval
: Falsetf32
: Nonelocal_rank
: 0ddp_backend
: Nonetpu_num_cores
: Nonetpu_metrics_debug
: Falsedebug
: []dataloader_drop_last
: Falsedataloader_num_workers
: 0dataloader_prefetch_factor
: Nonepast_index
: -1disable_tqdm
: Falseremove_unused_columns
: Truelabel_names
: Noneload_best_model_at_end
: Falseignore_data_skip
: Falsefsdp
: []fsdp_min_num_params
: 0fsdp_config
: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap
: Noneaccelerator_config
: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}deepspeed
: Nonelabel_smoothing_factor
: 0.0optim
: adamw_torchoptim_args
: Noneadafactor
: Falsegroup_by_length
: Falselength_column_name
: lengthddp_find_unused_parameters
: Noneddp_bucket_cap_mb
: Noneddp_broadcast_buffers
: Falsedataloader_pin_memory
: Truedataloader_persistent_workers
: Falseskip_memory_metrics
: Trueuse_legacy_prediction_loop
: Falsepush_to_hub
: Falseresume_from_checkpoint
: Nonehub_model_id
: Nonehub_strategy
: every_savehub_private_repo
: Nonehub_always_push
: Falsegradient_checkpointing
: Falsegradient_checkpointing_kwargs
: Noneinclude_inputs_for_metrics
: Falseinclude_for_metrics
: []eval_do_concat_batches
: Truefp16_backend
: autopush_to_hub_model_id
: Nonepush_to_hub_organization
: Nonemp_parameters
:auto_find_batch_size
: Falsefull_determinism
: Falsetorchdynamo
: Noneray_scope
: lastddp_timeout
: 1800torch_compile
: Falsetorch_compile_backend
: Nonetorch_compile_mode
: Nonedispatch_batches
: Nonesplit_batches
: Noneinclude_tokens_per_second
: Falseinclude_num_input_tokens_seen
: Falseneftune_noise_alpha
: Noneoptim_target_modules
: Nonebatch_eval_metrics
: Falseeval_on_start
: Falseuse_liger_kernel
: Falseeval_use_gather_object
: Falseaverage_tokens_across_devices
: Falseprompts
: Nonebatch_sampler
: no_duplicatesmulti_dataset_batch_sampler
: proportional
Training Logs
Epoch | Step | Training Loss | Validation Loss | basemodel_evaluator_spearman_cosine | finetunedmodel_evaluator_spearman_cosine |
---|---|---|---|---|---|
-1 | -1 | - | - | 0.0810 | - |
0.8 | 100 | 4.5047 | 3.9356 | 0.5954 | - |
-1 | -1 | - | - | - | 0.6004 |
Framework Versions
- Python: 3.11.11
- Sentence Transformers: 3.4.1
- Transformers: 4.48.3
- PyTorch: 2.5.1+cu124
- Accelerate: 1.3.0
- Datasets: 3.3.2
- Tokenizers: 0.21.0
Citation
BibTeX
Sentence Transformers
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
CoSENTLoss
@online{kexuefm-8847,
title={CoSENT: A more efficient sentence vector scheme than Sentence-BERT},
author={Su Jianlin},
year={2022},
month={Jan},
url={https://kexue.fm/archives/8847},
}