mediclaim_embedding / README.md
surajvbangera's picture
Add new SentenceTransformer model
29a1291 verified
metadata
language:
  - en
tags:
  - sentence-transformers
  - sentence-similarity
  - feature-extraction
  - generated_from_trainer
  - dataset_size:942069
  - loss:CoSENTLoss
base_model: microsoft/mpnet-base
widget:
  - source_sentence: Three women in dress suits walk by a building.
    sentences:
      - Three women are traveling by foot.
      - Two kids are flying on a hovercraft.
      - A man jumps in the ocean.
  - source_sentence: >-
      A man wearing sunglasses is sitting on the steps outside, reading a
      magazine.
    sentences:
      - Men are walking  in different directions.
      - There is a man running.
      - The man is reading a spoon with the words "HELP ME" on it.
  - source_sentence: >-
      A middle-aged man is sitting indian style outside holding a folded paper
      in his hands.
    sentences:
      - A man and woman are looking at produce.
      - A middle aged man is showing off his origami creation.
      - The boy is sitting
  - source_sentence: >-
      Two men playing baseball with the one in the black and red jersey running
      toward base.
    sentences:
      - The person is cooking a hamburger.
      - The man in black and red is sitting in the bleachers.
      - A player fighting in a soccer game.
  - source_sentence: Two men are in an electronics workshop, working on computers or equipment.
    sentences:
      - The men are experts when it comes to electronics.
      - A tall person sitting
      - A man is chasing an SUV that is going in the same direction as him.
datasets:
  - sentence-transformers/all-nli
pipeline_tag: sentence-similarity
library_name: sentence-transformers
metrics:
  - pearson_cosine
  - spearman_cosine
model-index:
  - name: SentenceTransformer based on microsoft/mpnet-base
    results:
      - task:
          type: semantic-similarity
          name: Semantic Similarity
        dataset:
          name: basemodel evaluator
          type: basemodel_evaluator
        metrics:
          - type: pearson_cosine
            value: 0.573397130206215
            name: Pearson Cosine
          - type: spearman_cosine
            value: 0.5954429501396288
            name: Spearman Cosine
      - task:
          type: semantic-similarity
          name: Semantic Similarity
        dataset:
          name: finetunedmodel evaluator
          type: finetunedmodel_evaluator
        metrics:
          - type: pearson_cosine
            value: 0.5716395027559762
            name: Pearson Cosine
          - type: spearman_cosine
            value: 0.6003777834660847
            name: Spearman Cosine
          - type: pearson_cosine
            value: 0.5716395027559762
            name: Pearson Cosine
          - type: spearman_cosine
            value: 0.6003777834660847
            name: Spearman Cosine

SentenceTransformer based on microsoft/mpnet-base

This is a sentence-transformers model finetuned from microsoft/mpnet-base on the all-nli dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: microsoft/mpnet-base
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 768 dimensions
  • Similarity Function: Cosine Similarity
  • Training Dataset:
  • Language: en

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: MPNetModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("surajvbangera/mediclaim_embedding")
# Run inference
sentences = [
    'Two men are in an electronics workshop, working on computers or equipment.',
    'The men are experts when it comes to electronics.',
    'A tall person sitting',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Semantic Similarity

Metric basemodel_evaluator finetunedmodel_evaluator
pearson_cosine 0.5734 0.5716
spearman_cosine 0.5954 0.6004

Training Details

Training Dataset

all-nli

  • Dataset: all-nli at d482672
  • Size: 942,069 training samples
  • Columns: sentence1, sentence2, and score
  • Approximate statistics based on the first 1000 samples:
    sentence1 sentence2 score
    type string string float
    details
    • min: 6 tokens
    • mean: 17.38 tokens
    • max: 52 tokens
    • min: 4 tokens
    • mean: 10.7 tokens
    • max: 31 tokens
    • min: 0.0
    • mean: 0.5
    • max: 1.0
  • Samples:
    sentence1 sentence2 score
    A person on a horse jumps over a broken down airplane. A person is training his horse for a competition. 0.5
    A person on a horse jumps over a broken down airplane. A person is at a diner, ordering an omelette. 0.0
    A person on a horse jumps over a broken down airplane. A person is outdoors, on a horse. 1.0
  • Loss: CoSENTLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "pairwise_cos_sim"
    }
    

Evaluation Dataset

all-nli

  • Dataset: all-nli at d482672
  • Size: 19,657 evaluation samples
  • Columns: sentence1, sentence2, and score
  • Approximate statistics based on the first 1000 samples:
    sentence1 sentence2 score
    type string string float
    details
    • min: 6 tokens
    • mean: 17.56 tokens
    • max: 45 tokens
    • min: 5 tokens
    • mean: 10.51 tokens
    • max: 25 tokens
    • min: 0.0
    • mean: 0.5
    • max: 1.0
  • Samples:
    sentence1 sentence2 score
    Two women are embracing while holding to go packages. The sisters are hugging goodbye while holding to go packages after just eating lunch. 0.5
    Two women are embracing while holding to go packages. Two woman are holding packages. 1.0
    Two women are embracing while holding to go packages. The men are fighting outside a deli. 0.0
  • Loss: CoSENTLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "pairwise_cos_sim"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • num_train_epochs: 1
  • warmup_ratio: 0.1
  • fp16: True
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 1
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss Validation Loss basemodel_evaluator_spearman_cosine finetunedmodel_evaluator_spearman_cosine
-1 -1 - - 0.0810 -
0.8 100 4.5047 3.9356 0.5954 -
-1 -1 - - - 0.6004

Framework Versions

  • Python: 3.11.11
  • Sentence Transformers: 3.4.1
  • Transformers: 4.48.3
  • PyTorch: 2.5.1+cu124
  • Accelerate: 1.3.0
  • Datasets: 3.3.2
  • Tokenizers: 0.21.0

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

CoSENTLoss

@online{kexuefm-8847,
    title={CoSENT: A more efficient sentence vector scheme than Sentence-BERT},
    author={Su Jianlin},
    year={2022},
    month={Jan},
    url={https://kexue.fm/archives/8847},
}