---
language:
- en
license: apache-2.0
tags:
- sentence-transformers
- sentence-similarity
- feature-extraction
- generated_from_trainer
- dataset_size:5822
- loss:MatryoshkaLoss
- loss:MultipleNegativesRankingLoss
base_model: nomic-ai/nomic-embed-text-v2-moe
widget:
- source_sentence: "the Polaris Solicitations as currently drafted do not comply with\
    \ Section 3306(c)(3).  In its request \nto apply Section 3306(c)(3) to the Polaris\
    \ Solicitations, GSA stated that \n \n \n  \nSupplement 2, AR at 2907–08.  Because\
    \ GSA adopted an overly broad understanding of Section \n3306(c)(3)’s scope, GSA\
    \ stated the Solicitations will include a “full range of order types,”"
  sentences:
  - What did Al-Hamim confirm about the citations?
  - What understanding did GSA adopt regarding Section 3306(c)(3)'s scope?
  - What was the reason for denying the agency's motion without prejudice?
- source_sentence: "objective (as position, profit, or a prize); [or to] be in a state\
    \ of rivalry.”  Compete, Merriam-\nWebster’s Collegiate Dictionary (11th ed. 2003);\
    \ see Competing, Merriam-Webster Dictionary, \nhttps://www.merriam-webster.com/dictionary/competing\
    \ (last visited Mar. 7, 2023) (defining \n“competing” as being “in a state of\
    \ rivalry or competition (as for position, profit, or a prize)”)."
  sentences:
  - Who claims that Congress has done much of the work to reconcile FACA § 10(b) and
    the FOIA exemptions?
  - When was the online dictionary last visited according to the document?
  - What action will the Court take regarding Count Nine in No. 11-444?
- source_sentence: "a witness for the State, Mr. Zimmerman testified that he was shot\
    \ in the back while sitting \nin the driver’s seat of his vehicle.  Over objection,\
    \ during Mr. Zimmerman’s direct \nexamination, the circuit court admitted into\
    \ evidence a video, retrieved by a detective, that \nhad been recorded by a camera\
    \ mounted on the exterior wall of a residence near the site of"
  sentences:
  - What must a complaint do to defeat a Rule 12(b)(6) motion?
  - What was the position of Mr. Zimmerman when he was shot?
  - What does Rule 11 impose on any party who signs a pleading, motion, or other paper?
- source_sentence: "than if they had submitted a new request on the same subject,”\
    \ Fifth Lutz Decl. ¶ 9, implicitly \nconfirms that the Assignment of Rights Policy\
    \ tends to prejudice requesters.  To the extent an \n“assignee would be placed\
    \ in a better position to litigate the assigned request than if they had \nsubmitted\
    \ a new request on the same subject,” id., then a FOIA requester “submit[ing]\
    \ a new"
  sentences:
  - What does the Fifth Lutz Declaration paragraph 9 imply about the Assignment of
    Rights Policy?
  - When did Illinois Supreme Court Rule 663 become effective?
  - What would happen if the Solicitations were amended to comply with the regulations
    according to the plaintiffs?
- source_sentence: "against six federal agencies pursuant to the Freedom of Information\
    \ Act (“FOIA”), 5 U.S.C. \n§ 552, claiming that the defendant agencies have violated\
    \ the FOIA in numerous ways.1  NSC’s \nclaims run the gamut, including challenges\
    \ to: the withholding of specific information; the \nadequacy of the agencies’\
    \ search efforts; the refusal to process FOIA requests; the refusal to"
  sentences:
  - Which case was quoted in Entertainment Ltd. v. U.S. Dep’t of Interior regarding
    the retroactivity of statutes?
  - How many federal agencies is the action against?
  - Who questioned Mr. Zimmerman after the bench conference?
pipeline_tag: sentence-similarity
library_name: sentence-transformers
metrics:
- cosine_accuracy@1
- cosine_accuracy@3
- cosine_accuracy@5
- cosine_accuracy@10
- cosine_precision@1
- cosine_precision@3
- cosine_precision@5
- cosine_precision@10
- cosine_recall@1
- cosine_recall@3
- cosine_recall@5
- cosine_recall@10
- cosine_ndcg@10
- cosine_mrr@10
- cosine_map@100
model-index:
- name: ModernBERT Embed base Legal Matryoshka
  results:
  - task:
      type: information-retrieval
      name: Information Retrieval
    dataset:
      name: dim 768
      type: dim_768
    metrics:
    - type: cosine_accuracy@1
      value: 0.5533230293663061
      name: Cosine Accuracy@1
    - type: cosine_accuracy@3
      value: 0.6105100463678517
      name: Cosine Accuracy@3
    - type: cosine_accuracy@5
      value: 0.7125193199381762
      name: Cosine Accuracy@5
    - type: cosine_accuracy@10
      value: 0.8083462132921174
      name: Cosine Accuracy@10
    - type: cosine_precision@1
      value: 0.5533230293663061
      name: Cosine Precision@1
    - type: cosine_precision@3
      value: 0.5275631117980423
      name: Cosine Precision@3
    - type: cosine_precision@5
      value: 0.4126738794435858
      name: Cosine Precision@5
    - type: cosine_precision@10
      value: 0.2502318392581144
      name: Cosine Precision@10
    - type: cosine_recall@1
      value: 0.1984801648634724
      name: Cosine Recall@1
    - type: cosine_recall@3
      value: 0.5175167439464194
      name: Cosine Recall@3
    - type: cosine_recall@5
      value: 0.6554611025244719
      name: Cosine Recall@5
    - type: cosine_recall@10
      value: 0.7895414734672848
      name: Cosine Recall@10
    - type: cosine_ndcg@10
      value: 0.6787324741180409
      name: Cosine Ndcg@10
    - type: cosine_mrr@10
      value: 0.610266553813694
      name: Cosine Mrr@10
    - type: cosine_map@100
      value: 0.6544139401960045
      name: Cosine Map@100
  - task:
      type: information-retrieval
      name: Information Retrieval
    dataset:
      name: dim 512
      type: dim_512
    metrics:
    - type: cosine_accuracy@1
      value: 0.5502318392581144
      name: Cosine Accuracy@1
    - type: cosine_accuracy@3
      value: 0.5996908809891809
      name: Cosine Accuracy@3
    - type: cosine_accuracy@5
      value: 0.7001545595054096
      name: Cosine Accuracy@5
    - type: cosine_accuracy@10
      value: 0.7897990726429676
      name: Cosine Accuracy@10
    - type: cosine_precision@1
      value: 0.5502318392581144
      name: Cosine Precision@1
    - type: cosine_precision@3
      value: 0.5218959299330241
      name: Cosine Precision@3
    - type: cosine_precision@5
      value: 0.4046367851622875
      name: Cosine Precision@5
    - type: cosine_precision@10
      value: 0.24296754250386396
      name: Cosine Precision@10
    - type: cosine_recall@1
      value: 0.19886656362699637
      name: Cosine Recall@1
    - type: cosine_recall@3
      value: 0.5137815558990211
      name: Cosine Recall@3
    - type: cosine_recall@5
      value: 0.643353941267388
      name: Cosine Recall@5
    - type: cosine_recall@10
      value: 0.7695775373518804
      name: Cosine Recall@10
    - type: cosine_ndcg@10
      value: 0.6665384668011486
      name: Cosine Ndcg@10
    - type: cosine_mrr@10
      value: 0.6033776158582955
      name: Cosine Mrr@10
    - type: cosine_map@100
      value: 0.6473311395712609
      name: Cosine Map@100
  - task:
      type: information-retrieval
      name: Information Retrieval
    dataset:
      name: dim 256
      type: dim_256
    metrics:
    - type: cosine_accuracy@1
      value: 0.5239567233384853
      name: Cosine Accuracy@1
    - type: cosine_accuracy@3
      value: 0.5703245749613601
      name: Cosine Accuracy@3
    - type: cosine_accuracy@5
      value: 0.6754250386398764
      name: Cosine Accuracy@5
    - type: cosine_accuracy@10
      value: 0.768160741885626
      name: Cosine Accuracy@10
    - type: cosine_precision@1
      value: 0.5239567233384853
      name: Cosine Precision@1
    - type: cosine_precision@3
      value: 0.4951056156620299
      name: Cosine Precision@3
    - type: cosine_precision@5
      value: 0.3888717156105101
      name: Cosine Precision@5
    - type: cosine_precision@10
      value: 0.23910355486862445
      name: Cosine Precision@10
    - type: cosine_recall@1
      value: 0.18830499742400822
      name: Cosine Recall@1
    - type: cosine_recall@3
      value: 0.4858320453374549
      name: Cosine Recall@3
    - type: cosine_recall@5
      value: 0.6172076249356002
      name: Cosine Recall@5
    - type: cosine_recall@10
      value: 0.750772797527048
      name: Cosine Recall@10
    - type: cosine_ndcg@10
      value: 0.6435527388538038
      name: Cosine Ndcg@10
    - type: cosine_mrr@10
      value: 0.5769025539118272
      name: Cosine Mrr@10
    - type: cosine_map@100
      value: 0.6222193004139938
      name: Cosine Map@100
  - task:
      type: information-retrieval
      name: Information Retrieval
    dataset:
      name: dim 128
      type: dim_128
    metrics:
    - type: cosine_accuracy@1
      value: 0.46213292117465227
      name: Cosine Accuracy@1
    - type: cosine_accuracy@3
      value: 0.5208655332302936
      name: Cosine Accuracy@3
    - type: cosine_accuracy@5
      value: 0.6089644513137558
      name: Cosine Accuracy@5
    - type: cosine_accuracy@10
      value: 0.6862442040185471
      name: Cosine Accuracy@10
    - type: cosine_precision@1
      value: 0.46213292117465227
      name: Cosine Precision@1
    - type: cosine_precision@3
      value: 0.4456465739309634
      name: Cosine Precision@3
    - type: cosine_precision@5
      value: 0.3536321483771252
      name: Cosine Precision@5
    - type: cosine_precision@10
      value: 0.21298299845440496
      name: Cosine Precision@10
    - type: cosine_recall@1
      value: 0.1656362699639361
      name: Cosine Recall@1
    - type: cosine_recall@3
      value: 0.4363730036063884
      name: Cosine Recall@3
    - type: cosine_recall@5
      value: 0.5607934054611026
      name: Cosine Recall@5
    - type: cosine_recall@10
      value: 0.6692426584234931
      name: Cosine Recall@10
    - type: cosine_ndcg@10
      value: 0.5742333897429361
      name: Cosine Ndcg@10
    - type: cosine_mrr@10
      value: 0.5144243271754859
      name: Cosine Mrr@10
    - type: cosine_map@100
      value: 0.5623047162890543
      name: Cosine Map@100
  - task:
      type: information-retrieval
      name: Information Retrieval
    dataset:
      name: dim 64
      type: dim_64
    metrics:
    - type: cosine_accuracy@1
      value: 0.3276661514683153
      name: Cosine Accuracy@1
    - type: cosine_accuracy@3
      value: 0.38639876352395675
      name: Cosine Accuracy@3
    - type: cosine_accuracy@5
      value: 0.47913446676970634
      name: Cosine Accuracy@5
    - type: cosine_accuracy@10
      value: 0.5641421947449768
      name: Cosine Accuracy@10
    - type: cosine_precision@1
      value: 0.3276661514683153
      name: Cosine Precision@1
    - type: cosine_precision@3
      value: 0.3219989696032972
      name: Cosine Precision@3
    - type: cosine_precision@5
      value: 0.2676970633693972
      name: Cosine Precision@5
    - type: cosine_precision@10
      value: 0.16924265842349304
      name: Cosine Precision@10
    - type: cosine_recall@1
      value: 0.1172076249356002
      name: Cosine Recall@1
    - type: cosine_recall@3
      value: 0.321483771251932
      name: Cosine Recall@3
    - type: cosine_recall@5
      value: 0.43379701184956204
      name: Cosine Recall@5
    - type: cosine_recall@10
      value: 0.5401854714064915
      name: Cosine Recall@10
    - type: cosine_ndcg@10
      value: 0.4411753101398826
      name: Cosine Ndcg@10
    - type: cosine_mrr@10
      value: 0.38149088589583163
      name: Cosine Mrr@10
    - type: cosine_map@100
      value: 0.43191750141987145
      name: Cosine Map@100
---

# ModernBERT Embed base Legal Matryoshka

This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [nomic-ai/nomic-embed-text-v2-moe](https://huggingface.co/nomic-ai/nomic-embed-text-v2-moe) on the json dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

## Model Details

### Model Description
- **Model Type:** Sentence Transformer
- **Base model:** [nomic-ai/nomic-embed-text-v2-moe](https://huggingface.co/nomic-ai/nomic-embed-text-v2-moe) <!-- at revision f6a8873b415144a69ffc529ec1e234d1e00ee765 -->
- **Maximum Sequence Length:** 512 tokens
- **Output Dimensionality:** 768 dimensions
- **Similarity Function:** Cosine Similarity
- **Training Dataset:**
    - json
- **Language:** en
- **License:** apache-2.0

### Model Sources

- **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
- **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
- **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)

### Full Model Architecture

```
SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: NomicBertModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)
```

## Usage

### Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

```bash
pip install -U sentence-transformers
```

Then you can load this model and run inference.
```python
from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("tsss1/modernbert-embed-base-legal-matryoshka-2")
# Run inference
sentences = [
    'against six federal agencies pursuant to the Freedom of Information Act (“FOIA”), 5 U.S.C. \n§ 552, claiming that the defendant agencies have violated the FOIA in numerous ways.1  NSC’s \nclaims run the gamut, including challenges to: the withholding of specific information; the \nadequacy of the agencies’ search efforts; the refusal to process FOIA requests; the refusal to',
    'How many federal agencies is the action against?',
    'Which case was quoted in Entertainment Ltd. v. U.S. Dep’t of Interior regarding the retroactivity of statutes?',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
```

<!--
### Direct Usage (Transformers)

<details><summary>Click to see the direct usage in Transformers</summary>

</details>
-->

<!--
### Downstream Usage (Sentence Transformers)

You can finetune this model on your own dataset.

<details><summary>Click to expand</summary>

</details>
-->

<!--
### Out-of-Scope Use

*List how the model may foreseeably be misused and address what users ought not to do with the model.*
-->

## Evaluation

### Metrics

#### Information Retrieval

* Datasets: `dim_768`, `dim_512`, `dim_256`, `dim_128` and `dim_64`
* Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator)

| Metric              | dim_768    | dim_512    | dim_256    | dim_128    | dim_64     |
|:--------------------|:-----------|:-----------|:-----------|:-----------|:-----------|
| cosine_accuracy@1   | 0.5533     | 0.5502     | 0.524      | 0.4621     | 0.3277     |
| cosine_accuracy@3   | 0.6105     | 0.5997     | 0.5703     | 0.5209     | 0.3864     |
| cosine_accuracy@5   | 0.7125     | 0.7002     | 0.6754     | 0.609      | 0.4791     |
| cosine_accuracy@10  | 0.8083     | 0.7898     | 0.7682     | 0.6862     | 0.5641     |
| cosine_precision@1  | 0.5533     | 0.5502     | 0.524      | 0.4621     | 0.3277     |
| cosine_precision@3  | 0.5276     | 0.5219     | 0.4951     | 0.4456     | 0.322      |
| cosine_precision@5  | 0.4127     | 0.4046     | 0.3889     | 0.3536     | 0.2677     |
| cosine_precision@10 | 0.2502     | 0.243      | 0.2391     | 0.213      | 0.1692     |
| cosine_recall@1     | 0.1985     | 0.1989     | 0.1883     | 0.1656     | 0.1172     |
| cosine_recall@3     | 0.5175     | 0.5138     | 0.4858     | 0.4364     | 0.3215     |
| cosine_recall@5     | 0.6555     | 0.6434     | 0.6172     | 0.5608     | 0.4338     |
| cosine_recall@10    | 0.7895     | 0.7696     | 0.7508     | 0.6692     | 0.5402     |
| **cosine_ndcg@10**  | **0.6787** | **0.6665** | **0.6436** | **0.5742** | **0.4412** |
| cosine_mrr@10       | 0.6103     | 0.6034     | 0.5769     | 0.5144     | 0.3815     |
| cosine_map@100      | 0.6544     | 0.6473     | 0.6222     | 0.5623     | 0.4319     |

<!--
## Bias, Risks and Limitations

*What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
-->

<!--
### Recommendations

*What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
-->

## Training Details

### Training Dataset

#### json

* Dataset: json
* Size: 5,822 training samples
* Columns: <code>positive</code> and <code>anchor</code>
* Approximate statistics based on the first 1000 samples:
  |         | positive                                                                            | anchor                                                                            |
  |:--------|:------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|
  | type    | string                                                                              | string                                                                            |
  | details | <ul><li>min: 29 tokens</li><li>mean: 94.33 tokens</li><li>max: 156 tokens</li></ul> | <ul><li>min: 8 tokens</li><li>mean: 18.25 tokens</li><li>max: 35 tokens</li></ul> |
* Samples:
  | positive                                                                                                                                                                                                                                                                                                                                                                                                    | anchor                                                                                                  |
  |:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:--------------------------------------------------------------------------------------------------------|
  | <code>aspect” of “substantial independent authority.”  Dong v. Smithsonian Inst., 125 F.3d 877, 881 <br>                                                 <br>4  See CREW v. Office of Admin., 566 F.3d 219, 220 (D.C. Cir. 2009); Armstrong v. Exec. Office <br>of the President, 90 F.3d 553, 558 (D.C. Cir. 1996); Sweetland v. Walters, 60 F.3d 852, 854</code>                                          | <code>What court circuit is mentioned in connection with the case Sweetland v. Walters?</code>          |
  | <code>the entire list of remaining PQPs shifts up one position.   <br>Once GSA has verified, through the evaluation and validation process, the point totals <br>claimed by the 100/80/70 highest-scoring offerors, GSA will cease evaluations and award IDIQ <br>contracts to the successful, verified bidders.  AR at 1114, 2154, 2645.  If, after the evaluation</code>                                  | <code>What is the GSA responsible for verifying?</code>                                                 |
  | <code>Department components], to assist with the processing of [FOIA or Privacy Act] requests for <br>purposes of administrative expediency and efficiency.”  Third Walter Decl. ¶ 3.  Indeed, the <br>State Department’s declarant explains that these five State Department components, including <br>DS, “conduct their own FOIA/Privacy Act reviews and respond directly to requesters,” despite</code> | <code>What is the identified purpose for assisting with processing FOIA or Privacy Act requests?</code> |
* Loss: [<code>MatryoshkaLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#matryoshkaloss) with these parameters:
  ```json
  {
      "loss": "MultipleNegativesRankingLoss",
      "matryoshka_dims": [
          768,
          512,
          256,
          128,
          64
      ],
      "matryoshka_weights": [
          1,
          1,
          1,
          1,
          1
      ],
      "n_dims_per_step": -1
  }
  ```

### Training Hyperparameters
#### Non-Default Hyperparameters

- `eval_strategy`: epoch
- `per_device_train_batch_size`: 4
- `per_device_eval_batch_size`: 2
- `gradient_accumulation_steps`: 4
- `learning_rate`: 2e-05
- `num_train_epochs`: 2
- `lr_scheduler_type`: cosine
- `warmup_ratio`: 0.1
- `bf16`: True
- `tf32`: False
- `load_best_model_at_end`: True
- `optim`: adamw_torch_fused
- `batch_sampler`: no_duplicates

#### All Hyperparameters
<details><summary>Click to expand</summary>

- `overwrite_output_dir`: False
- `do_predict`: False
- `eval_strategy`: epoch
- `prediction_loss_only`: True
- `per_device_train_batch_size`: 4
- `per_device_eval_batch_size`: 2
- `per_gpu_train_batch_size`: None
- `per_gpu_eval_batch_size`: None
- `gradient_accumulation_steps`: 4
- `eval_accumulation_steps`: None
- `torch_empty_cache_steps`: None
- `learning_rate`: 2e-05
- `weight_decay`: 0.0
- `adam_beta1`: 0.9
- `adam_beta2`: 0.999
- `adam_epsilon`: 1e-08
- `max_grad_norm`: 1.0
- `num_train_epochs`: 2
- `max_steps`: -1
- `lr_scheduler_type`: cosine
- `lr_scheduler_kwargs`: {}
- `warmup_ratio`: 0.1
- `warmup_steps`: 0
- `log_level`: passive
- `log_level_replica`: warning
- `log_on_each_node`: True
- `logging_nan_inf_filter`: True
- `save_safetensors`: True
- `save_on_each_node`: False
- `save_only_model`: False
- `restore_callback_states_from_checkpoint`: False
- `no_cuda`: False
- `use_cpu`: False
- `use_mps_device`: False
- `seed`: 42
- `data_seed`: None
- `jit_mode_eval`: False
- `use_ipex`: False
- `bf16`: True
- `fp16`: False
- `fp16_opt_level`: O1
- `half_precision_backend`: auto
- `bf16_full_eval`: False
- `fp16_full_eval`: False
- `tf32`: False
- `local_rank`: 0
- `ddp_backend`: None
- `tpu_num_cores`: None
- `tpu_metrics_debug`: False
- `debug`: []
- `dataloader_drop_last`: False
- `dataloader_num_workers`: 0
- `dataloader_prefetch_factor`: None
- `past_index`: -1
- `disable_tqdm`: False
- `remove_unused_columns`: True
- `label_names`: None
- `load_best_model_at_end`: True
- `ignore_data_skip`: False
- `fsdp`: []
- `fsdp_min_num_params`: 0
- `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
- `fsdp_transformer_layer_cls_to_wrap`: None
- `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
- `deepspeed`: None
- `label_smoothing_factor`: 0.0
- `optim`: adamw_torch_fused
- `optim_args`: None
- `adafactor`: False
- `group_by_length`: False
- `length_column_name`: length
- `ddp_find_unused_parameters`: None
- `ddp_bucket_cap_mb`: None
- `ddp_broadcast_buffers`: False
- `dataloader_pin_memory`: True
- `dataloader_persistent_workers`: False
- `skip_memory_metrics`: True
- `use_legacy_prediction_loop`: False
- `push_to_hub`: False
- `resume_from_checkpoint`: None
- `hub_model_id`: None
- `hub_strategy`: every_save
- `hub_private_repo`: None
- `hub_always_push`: False
- `gradient_checkpointing`: False
- `gradient_checkpointing_kwargs`: None
- `include_inputs_for_metrics`: False
- `include_for_metrics`: []
- `eval_do_concat_batches`: True
- `fp16_backend`: auto
- `push_to_hub_model_id`: None
- `push_to_hub_organization`: None
- `mp_parameters`: 
- `auto_find_batch_size`: False
- `full_determinism`: False
- `torchdynamo`: None
- `ray_scope`: last
- `ddp_timeout`: 1800
- `torch_compile`: False
- `torch_compile_backend`: None
- `torch_compile_mode`: None
- `dispatch_batches`: None
- `split_batches`: None
- `include_tokens_per_second`: False
- `include_num_input_tokens_seen`: False
- `neftune_noise_alpha`: None
- `optim_target_modules`: None
- `batch_eval_metrics`: False
- `eval_on_start`: False
- `use_liger_kernel`: False
- `eval_use_gather_object`: False
- `average_tokens_across_devices`: False
- `prompts`: None
- `batch_sampler`: no_duplicates
- `multi_dataset_batch_sampler`: proportional

</details>

### Training Logs
| Epoch   | Step    | Training Loss | dim_768_cosine_ndcg@10 | dim_512_cosine_ndcg@10 | dim_256_cosine_ndcg@10 | dim_128_cosine_ndcg@10 | dim_64_cosine_ndcg@10 |
|:-------:|:-------:|:-------------:|:----------------------:|:----------------------:|:----------------------:|:----------------------:|:---------------------:|
| 0.0549  | 10      | 2.6704        | -                      | -                      | -                      | -                      | -                     |
| 0.1099  | 20      | 1.7246        | -                      | -                      | -                      | -                      | -                     |
| 0.1648  | 30      | 1.3634        | -                      | -                      | -                      | -                      | -                     |
| 0.2198  | 40      | 1.0962        | -                      | -                      | -                      | -                      | -                     |
| 0.2747  | 50      | 0.8985        | -                      | -                      | -                      | -                      | -                     |
| 0.3297  | 60      | 0.8667        | -                      | -                      | -                      | -                      | -                     |
| 0.3846  | 70      | 0.7371        | -                      | -                      | -                      | -                      | -                     |
| 0.4396  | 80      | 1.038         | -                      | -                      | -                      | -                      | -                     |
| 0.4945  | 90      | 0.733         | -                      | -                      | -                      | -                      | -                     |
| 0.5495  | 100     | 0.9032        | -                      | -                      | -                      | -                      | -                     |
| 0.6044  | 110     | 0.7283        | -                      | -                      | -                      | -                      | -                     |
| 0.6593  | 120     | 0.6085        | -                      | -                      | -                      | -                      | -                     |
| 0.7143  | 130     | 0.5774        | -                      | -                      | -                      | -                      | -                     |
| 0.7692  | 140     | 0.6164        | -                      | -                      | -                      | -                      | -                     |
| 0.8242  | 150     | 0.8098        | -                      | -                      | -                      | -                      | -                     |
| 0.8791  | 160     | 0.6534        | -                      | -                      | -                      | -                      | -                     |
| 0.9341  | 170     | 0.6035        | -                      | -                      | -                      | -                      | -                     |
| 0.9890  | 180     | 0.5209        | -                      | -                      | -                      | -                      | -                     |
| 1.0     | 182     | -             | 0.6911                 | 0.6719                 | 0.6341                 | 0.5600                 | 0.4203                |
| 1.0440  | 190     | 0.3718        | -                      | -                      | -                      | -                      | -                     |
| 1.0989  | 200     | 0.2309        | -                      | -                      | -                      | -                      | -                     |
| 1.1538  | 210     | 0.2128        | -                      | -                      | -                      | -                      | -                     |
| 1.2088  | 220     | 0.138         | -                      | -                      | -                      | -                      | -                     |
| 1.2637  | 230     | 0.1129        | -                      | -                      | -                      | -                      | -                     |
| 1.3187  | 240     | 0.0889        | -                      | -                      | -                      | -                      | -                     |
| 1.3736  | 250     | 0.0607        | -                      | -                      | -                      | -                      | -                     |
| 1.4286  | 260     | 0.1156        | -                      | -                      | -                      | -                      | -                     |
| 1.4835  | 270     | 0.0826        | -                      | -                      | -                      | -                      | -                     |
| 1.5385  | 280     | 0.098         | -                      | -                      | -                      | -                      | -                     |
| 1.5934  | 290     | 0.0891        | -                      | -                      | -                      | -                      | -                     |
| 1.6484  | 300     | 0.0451        | -                      | -                      | -                      | -                      | -                     |
| 1.7033  | 310     | 0.0581        | -                      | -                      | -                      | -                      | -                     |
| 1.7582  | 320     | 0.0722        | -                      | -                      | -                      | -                      | -                     |
| 1.8132  | 330     | 0.0785        | -                      | -                      | -                      | -                      | -                     |
| 1.8681  | 340     | 0.1407        | -                      | -                      | -                      | -                      | -                     |
| 1.9231  | 350     | 0.1022        | -                      | -                      | -                      | -                      | -                     |
| 1.9780  | 360     | 0.0771        | -                      | -                      | -                      | -                      | -                     |
| **2.0** | **364** | **-**         | **0.6787**             | **0.6665**             | **0.6436**             | **0.5742**             | **0.4412**            |

* The bold row denotes the saved checkpoint.

### Framework Versions
- Python: 3.10.12
- Sentence Transformers: 3.3.1
- Transformers: 4.47.0
- PyTorch: 2.3.1+cu121
- Accelerate: 1.2.1
- Datasets: 3.3.1
- Tokenizers: 0.21.0

## Citation

### BibTeX

#### Sentence Transformers
```bibtex
@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}
```

#### MatryoshkaLoss
```bibtex
@misc{kusupati2024matryoshka,
    title={Matryoshka Representation Learning},
    author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
    year={2024},
    eprint={2205.13147},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}
```

#### MultipleNegativesRankingLoss
```bibtex
@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
```

<!--
## Glossary

*Clearly define terms in order to be accessible across audiences.*
-->

<!--
## Model Card Authors

*Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
-->

<!--
## Model Card Contact

*Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
-->