--- language: - en license: apache-2.0 tags: - sentence-transformers - sentence-similarity - feature-extraction - generated_from_trainer - dataset_size:5822 - loss:MatryoshkaLoss - loss:MultipleNegativesRankingLoss base_model: nomic-ai/nomic-embed-text-v2-moe widget: - source_sentence: "the Polaris Solicitations as currently drafted do not comply with\ \ Section 3306(c)(3). In its request \nto apply Section 3306(c)(3) to the Polaris\ \ Solicitations, GSA stated that \n \n \n \nSupplement 2, AR at 2907–08. Because\ \ GSA adopted an overly broad understanding of Section \n3306(c)(3)’s scope, GSA\ \ stated the Solicitations will include a “full range of order types,”" sentences: - What did Al-Hamim confirm about the citations? - What understanding did GSA adopt regarding Section 3306(c)(3)'s scope? - What was the reason for denying the agency's motion without prejudice? - source_sentence: "objective (as position, profit, or a prize); [or to] be in a state\ \ of rivalry.” Compete, Merriam-\nWebster’s Collegiate Dictionary (11th ed. 2003);\ \ see Competing, Merriam-Webster Dictionary, \nhttps://www.merriam-webster.com/dictionary/competing\ \ (last visited Mar. 7, 2023) (defining \n“competing” as being “in a state of\ \ rivalry or competition (as for position, profit, or a prize)”)." sentences: - Who claims that Congress has done much of the work to reconcile FACA § 10(b) and the FOIA exemptions? - When was the online dictionary last visited according to the document? - What action will the Court take regarding Count Nine in No. 11-444? - source_sentence: "a witness for the State, Mr. Zimmerman testified that he was shot\ \ in the back while sitting \nin the driver’s seat of his vehicle. Over objection,\ \ during Mr. Zimmerman’s direct \nexamination, the circuit court admitted into\ \ evidence a video, retrieved by a detective, that \nhad been recorded by a camera\ \ mounted on the exterior wall of a residence near the site of" sentences: - What must a complaint do to defeat a Rule 12(b)(6) motion? - What was the position of Mr. Zimmerman when he was shot? - What does Rule 11 impose on any party who signs a pleading, motion, or other paper? - source_sentence: "than if they had submitted a new request on the same subject,”\ \ Fifth Lutz Decl. ¶ 9, implicitly \nconfirms that the Assignment of Rights Policy\ \ tends to prejudice requesters. To the extent an \n“assignee would be placed\ \ in a better position to litigate the assigned request than if they had \nsubmitted\ \ a new request on the same subject,” id., then a FOIA requester “submit[ing]\ \ a new" sentences: - What does the Fifth Lutz Declaration paragraph 9 imply about the Assignment of Rights Policy? - When did Illinois Supreme Court Rule 663 become effective? - What would happen if the Solicitations were amended to comply with the regulations according to the plaintiffs? - source_sentence: "against six federal agencies pursuant to the Freedom of Information\ \ Act (“FOIA”), 5 U.S.C. \n§ 552, claiming that the defendant agencies have violated\ \ the FOIA in numerous ways.1 NSC’s \nclaims run the gamut, including challenges\ \ to: the withholding of specific information; the \nadequacy of the agencies’\ \ search efforts; the refusal to process FOIA requests; the refusal to" sentences: - Which case was quoted in Entertainment Ltd. v. U.S. Dep’t of Interior regarding the retroactivity of statutes? - How many federal agencies is the action against? - Who questioned Mr. Zimmerman after the bench conference? pipeline_tag: sentence-similarity library_name: sentence-transformers metrics: - cosine_accuracy@1 - cosine_accuracy@3 - cosine_accuracy@5 - cosine_accuracy@10 - cosine_precision@1 - cosine_precision@3 - cosine_precision@5 - cosine_precision@10 - cosine_recall@1 - cosine_recall@3 - cosine_recall@5 - cosine_recall@10 - cosine_ndcg@10 - cosine_mrr@10 - cosine_map@100 model-index: - name: ModernBERT Embed base Legal Matryoshka results: - task: type: information-retrieval name: Information Retrieval dataset: name: dim 768 type: dim_768 metrics: - type: cosine_accuracy@1 value: 0.5533230293663061 name: Cosine Accuracy@1 - type: cosine_accuracy@3 value: 0.6105100463678517 name: Cosine Accuracy@3 - type: cosine_accuracy@5 value: 0.7125193199381762 name: Cosine Accuracy@5 - type: cosine_accuracy@10 value: 0.8083462132921174 name: Cosine Accuracy@10 - type: cosine_precision@1 value: 0.5533230293663061 name: Cosine Precision@1 - type: cosine_precision@3 value: 0.5275631117980423 name: Cosine Precision@3 - type: cosine_precision@5 value: 0.4126738794435858 name: Cosine Precision@5 - type: cosine_precision@10 value: 0.2502318392581144 name: Cosine Precision@10 - type: cosine_recall@1 value: 0.1984801648634724 name: Cosine Recall@1 - type: cosine_recall@3 value: 0.5175167439464194 name: Cosine Recall@3 - type: cosine_recall@5 value: 0.6554611025244719 name: Cosine Recall@5 - type: cosine_recall@10 value: 0.7895414734672848 name: Cosine Recall@10 - type: cosine_ndcg@10 value: 0.6787324741180409 name: Cosine Ndcg@10 - type: cosine_mrr@10 value: 0.610266553813694 name: Cosine Mrr@10 - type: cosine_map@100 value: 0.6544139401960045 name: Cosine Map@100 - task: type: information-retrieval name: Information Retrieval dataset: name: dim 512 type: dim_512 metrics: - type: cosine_accuracy@1 value: 0.5502318392581144 name: Cosine Accuracy@1 - type: cosine_accuracy@3 value: 0.5996908809891809 name: Cosine Accuracy@3 - type: cosine_accuracy@5 value: 0.7001545595054096 name: Cosine Accuracy@5 - type: cosine_accuracy@10 value: 0.7897990726429676 name: Cosine Accuracy@10 - type: cosine_precision@1 value: 0.5502318392581144 name: Cosine Precision@1 - type: cosine_precision@3 value: 0.5218959299330241 name: Cosine Precision@3 - type: cosine_precision@5 value: 0.4046367851622875 name: Cosine Precision@5 - type: cosine_precision@10 value: 0.24296754250386396 name: Cosine Precision@10 - type: cosine_recall@1 value: 0.19886656362699637 name: Cosine Recall@1 - type: cosine_recall@3 value: 0.5137815558990211 name: Cosine Recall@3 - type: cosine_recall@5 value: 0.643353941267388 name: Cosine Recall@5 - type: cosine_recall@10 value: 0.7695775373518804 name: Cosine Recall@10 - type: cosine_ndcg@10 value: 0.6665384668011486 name: Cosine Ndcg@10 - type: cosine_mrr@10 value: 0.6033776158582955 name: Cosine Mrr@10 - type: cosine_map@100 value: 0.6473311395712609 name: Cosine Map@100 - task: type: information-retrieval name: Information Retrieval dataset: name: dim 256 type: dim_256 metrics: - type: cosine_accuracy@1 value: 0.5239567233384853 name: Cosine Accuracy@1 - type: cosine_accuracy@3 value: 0.5703245749613601 name: Cosine Accuracy@3 - type: cosine_accuracy@5 value: 0.6754250386398764 name: Cosine Accuracy@5 - type: cosine_accuracy@10 value: 0.768160741885626 name: Cosine Accuracy@10 - type: cosine_precision@1 value: 0.5239567233384853 name: Cosine Precision@1 - type: cosine_precision@3 value: 0.4951056156620299 name: Cosine Precision@3 - type: cosine_precision@5 value: 0.3888717156105101 name: Cosine Precision@5 - type: cosine_precision@10 value: 0.23910355486862445 name: Cosine Precision@10 - type: cosine_recall@1 value: 0.18830499742400822 name: Cosine Recall@1 - type: cosine_recall@3 value: 0.4858320453374549 name: Cosine Recall@3 - type: cosine_recall@5 value: 0.6172076249356002 name: Cosine Recall@5 - type: cosine_recall@10 value: 0.750772797527048 name: Cosine Recall@10 - type: cosine_ndcg@10 value: 0.6435527388538038 name: Cosine Ndcg@10 - type: cosine_mrr@10 value: 0.5769025539118272 name: Cosine Mrr@10 - type: cosine_map@100 value: 0.6222193004139938 name: Cosine Map@100 - task: type: information-retrieval name: Information Retrieval dataset: name: dim 128 type: dim_128 metrics: - type: cosine_accuracy@1 value: 0.46213292117465227 name: Cosine Accuracy@1 - type: cosine_accuracy@3 value: 0.5208655332302936 name: Cosine Accuracy@3 - type: cosine_accuracy@5 value: 0.6089644513137558 name: Cosine Accuracy@5 - type: cosine_accuracy@10 value: 0.6862442040185471 name: Cosine Accuracy@10 - type: cosine_precision@1 value: 0.46213292117465227 name: Cosine Precision@1 - type: cosine_precision@3 value: 0.4456465739309634 name: Cosine Precision@3 - type: cosine_precision@5 value: 0.3536321483771252 name: Cosine Precision@5 - type: cosine_precision@10 value: 0.21298299845440496 name: Cosine Precision@10 - type: cosine_recall@1 value: 0.1656362699639361 name: Cosine Recall@1 - type: cosine_recall@3 value: 0.4363730036063884 name: Cosine Recall@3 - type: cosine_recall@5 value: 0.5607934054611026 name: Cosine Recall@5 - type: cosine_recall@10 value: 0.6692426584234931 name: Cosine Recall@10 - type: cosine_ndcg@10 value: 0.5742333897429361 name: Cosine Ndcg@10 - type: cosine_mrr@10 value: 0.5144243271754859 name: Cosine Mrr@10 - type: cosine_map@100 value: 0.5623047162890543 name: Cosine Map@100 - task: type: information-retrieval name: Information Retrieval dataset: name: dim 64 type: dim_64 metrics: - type: cosine_accuracy@1 value: 0.3276661514683153 name: Cosine Accuracy@1 - type: cosine_accuracy@3 value: 0.38639876352395675 name: Cosine Accuracy@3 - type: cosine_accuracy@5 value: 0.47913446676970634 name: Cosine Accuracy@5 - type: cosine_accuracy@10 value: 0.5641421947449768 name: Cosine Accuracy@10 - type: cosine_precision@1 value: 0.3276661514683153 name: Cosine Precision@1 - type: cosine_precision@3 value: 0.3219989696032972 name: Cosine Precision@3 - type: cosine_precision@5 value: 0.2676970633693972 name: Cosine Precision@5 - type: cosine_precision@10 value: 0.16924265842349304 name: Cosine Precision@10 - type: cosine_recall@1 value: 0.1172076249356002 name: Cosine Recall@1 - type: cosine_recall@3 value: 0.321483771251932 name: Cosine Recall@3 - type: cosine_recall@5 value: 0.43379701184956204 name: Cosine Recall@5 - type: cosine_recall@10 value: 0.5401854714064915 name: Cosine Recall@10 - type: cosine_ndcg@10 value: 0.4411753101398826 name: Cosine Ndcg@10 - type: cosine_mrr@10 value: 0.38149088589583163 name: Cosine Mrr@10 - type: cosine_map@100 value: 0.43191750141987145 name: Cosine Map@100 --- # ModernBERT Embed base Legal Matryoshka This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [nomic-ai/nomic-embed-text-v2-moe](https://huggingface.co/nomic-ai/nomic-embed-text-v2-moe) on the json dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more. ## Model Details ### Model Description - **Model Type:** Sentence Transformer - **Base model:** [nomic-ai/nomic-embed-text-v2-moe](https://huggingface.co/nomic-ai/nomic-embed-text-v2-moe) - **Maximum Sequence Length:** 512 tokens - **Output Dimensionality:** 768 dimensions - **Similarity Function:** Cosine Similarity - **Training Dataset:** - json - **Language:** en - **License:** apache-2.0 ### Model Sources - **Documentation:** [Sentence Transformers Documentation](https://sbert.net) - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers) - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers) ### Full Model Architecture ``` SentenceTransformer( (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: NomicBertModel (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True}) (2): Normalize() ) ``` ## Usage ### Direct Usage (Sentence Transformers) First install the Sentence Transformers library: ```bash pip install -U sentence-transformers ``` Then you can load this model and run inference. ```python from sentence_transformers import SentenceTransformer # Download from the 🤗 Hub model = SentenceTransformer("tsss1/modernbert-embed-base-legal-matryoshka-2") # Run inference sentences = [ 'against six federal agencies pursuant to the Freedom of Information Act (“FOIA”), 5 U.S.C. \n§ 552, claiming that the defendant agencies have violated the FOIA in numerous ways.1 NSC’s \nclaims run the gamut, including challenges to: the withholding of specific information; the \nadequacy of the agencies’ search efforts; the refusal to process FOIA requests; the refusal to', 'How many federal agencies is the action against?', 'Which case was quoted in Entertainment Ltd. v. U.S. Dep’t of Interior regarding the retroactivity of statutes?', ] embeddings = model.encode(sentences) print(embeddings.shape) # [3, 768] # Get the similarity scores for the embeddings similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [3, 3] ``` ## Evaluation ### Metrics #### Information Retrieval * Datasets: `dim_768`, `dim_512`, `dim_256`, `dim_128` and `dim_64` * Evaluated with [InformationRetrievalEvaluator](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator) | Metric | dim_768 | dim_512 | dim_256 | dim_128 | dim_64 | |:--------------------|:-----------|:-----------|:-----------|:-----------|:-----------| | cosine_accuracy@1 | 0.5533 | 0.5502 | 0.524 | 0.4621 | 0.3277 | | cosine_accuracy@3 | 0.6105 | 0.5997 | 0.5703 | 0.5209 | 0.3864 | | cosine_accuracy@5 | 0.7125 | 0.7002 | 0.6754 | 0.609 | 0.4791 | | cosine_accuracy@10 | 0.8083 | 0.7898 | 0.7682 | 0.6862 | 0.5641 | | cosine_precision@1 | 0.5533 | 0.5502 | 0.524 | 0.4621 | 0.3277 | | cosine_precision@3 | 0.5276 | 0.5219 | 0.4951 | 0.4456 | 0.322 | | cosine_precision@5 | 0.4127 | 0.4046 | 0.3889 | 0.3536 | 0.2677 | | cosine_precision@10 | 0.2502 | 0.243 | 0.2391 | 0.213 | 0.1692 | | cosine_recall@1 | 0.1985 | 0.1989 | 0.1883 | 0.1656 | 0.1172 | | cosine_recall@3 | 0.5175 | 0.5138 | 0.4858 | 0.4364 | 0.3215 | | cosine_recall@5 | 0.6555 | 0.6434 | 0.6172 | 0.5608 | 0.4338 | | cosine_recall@10 | 0.7895 | 0.7696 | 0.7508 | 0.6692 | 0.5402 | | **cosine_ndcg@10** | **0.6787** | **0.6665** | **0.6436** | **0.5742** | **0.4412** | | cosine_mrr@10 | 0.6103 | 0.6034 | 0.5769 | 0.5144 | 0.3815 | | cosine_map@100 | 0.6544 | 0.6473 | 0.6222 | 0.5623 | 0.4319 | ## Training Details ### Training Dataset #### json * Dataset: json * Size: 5,822 training samples * Columns: positive and anchor * Approximate statistics based on the first 1000 samples: | | positive | anchor | |:--------|:------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------| | type | string | string | | details | | | * Samples: | positive | anchor | |:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:--------------------------------------------------------------------------------------------------------| | aspect” of “substantial independent authority.” Dong v. Smithsonian Inst., 125 F.3d 877, 881

4 See CREW v. Office of Admin., 566 F.3d 219, 220 (D.C. Cir. 2009); Armstrong v. Exec. Office
of the President, 90 F.3d 553, 558 (D.C. Cir. 1996); Sweetland v. Walters, 60 F.3d 852, 854
| What court circuit is mentioned in connection with the case Sweetland v. Walters? | | the entire list of remaining PQPs shifts up one position.
Once GSA has verified, through the evaluation and validation process, the point totals
claimed by the 100/80/70 highest-scoring offerors, GSA will cease evaluations and award IDIQ
contracts to the successful, verified bidders. AR at 1114, 2154, 2645. If, after the evaluation
| What is the GSA responsible for verifying? | | Department components], to assist with the processing of [FOIA or Privacy Act] requests for
purposes of administrative expediency and efficiency.” Third Walter Decl. ¶ 3. Indeed, the
State Department’s declarant explains that these five State Department components, including
DS, “conduct their own FOIA/Privacy Act reviews and respond directly to requesters,” despite
| What is the identified purpose for assisting with processing FOIA or Privacy Act requests? | * Loss: [MatryoshkaLoss](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#matryoshkaloss) with these parameters: ```json { "loss": "MultipleNegativesRankingLoss", "matryoshka_dims": [ 768, 512, 256, 128, 64 ], "matryoshka_weights": [ 1, 1, 1, 1, 1 ], "n_dims_per_step": -1 } ``` ### Training Hyperparameters #### Non-Default Hyperparameters - `eval_strategy`: epoch - `per_device_train_batch_size`: 4 - `per_device_eval_batch_size`: 2 - `gradient_accumulation_steps`: 4 - `learning_rate`: 2e-05 - `num_train_epochs`: 2 - `lr_scheduler_type`: cosine - `warmup_ratio`: 0.1 - `bf16`: True - `tf32`: False - `load_best_model_at_end`: True - `optim`: adamw_torch_fused - `batch_sampler`: no_duplicates #### All Hyperparameters
Click to expand - `overwrite_output_dir`: False - `do_predict`: False - `eval_strategy`: epoch - `prediction_loss_only`: True - `per_device_train_batch_size`: 4 - `per_device_eval_batch_size`: 2 - `per_gpu_train_batch_size`: None - `per_gpu_eval_batch_size`: None - `gradient_accumulation_steps`: 4 - `eval_accumulation_steps`: None - `torch_empty_cache_steps`: None - `learning_rate`: 2e-05 - `weight_decay`: 0.0 - `adam_beta1`: 0.9 - `adam_beta2`: 0.999 - `adam_epsilon`: 1e-08 - `max_grad_norm`: 1.0 - `num_train_epochs`: 2 - `max_steps`: -1 - `lr_scheduler_type`: cosine - `lr_scheduler_kwargs`: {} - `warmup_ratio`: 0.1 - `warmup_steps`: 0 - `log_level`: passive - `log_level_replica`: warning - `log_on_each_node`: True - `logging_nan_inf_filter`: True - `save_safetensors`: True - `save_on_each_node`: False - `save_only_model`: False - `restore_callback_states_from_checkpoint`: False - `no_cuda`: False - `use_cpu`: False - `use_mps_device`: False - `seed`: 42 - `data_seed`: None - `jit_mode_eval`: False - `use_ipex`: False - `bf16`: True - `fp16`: False - `fp16_opt_level`: O1 - `half_precision_backend`: auto - `bf16_full_eval`: False - `fp16_full_eval`: False - `tf32`: False - `local_rank`: 0 - `ddp_backend`: None - `tpu_num_cores`: None - `tpu_metrics_debug`: False - `debug`: [] - `dataloader_drop_last`: False - `dataloader_num_workers`: 0 - `dataloader_prefetch_factor`: None - `past_index`: -1 - `disable_tqdm`: False - `remove_unused_columns`: True - `label_names`: None - `load_best_model_at_end`: True - `ignore_data_skip`: False - `fsdp`: [] - `fsdp_min_num_params`: 0 - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False} - `fsdp_transformer_layer_cls_to_wrap`: None - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None} - `deepspeed`: None - `label_smoothing_factor`: 0.0 - `optim`: adamw_torch_fused - `optim_args`: None - `adafactor`: False - `group_by_length`: False - `length_column_name`: length - `ddp_find_unused_parameters`: None - `ddp_bucket_cap_mb`: None - `ddp_broadcast_buffers`: False - `dataloader_pin_memory`: True - `dataloader_persistent_workers`: False - `skip_memory_metrics`: True - `use_legacy_prediction_loop`: False - `push_to_hub`: False - `resume_from_checkpoint`: None - `hub_model_id`: None - `hub_strategy`: every_save - `hub_private_repo`: None - `hub_always_push`: False - `gradient_checkpointing`: False - `gradient_checkpointing_kwargs`: None - `include_inputs_for_metrics`: False - `include_for_metrics`: [] - `eval_do_concat_batches`: True - `fp16_backend`: auto - `push_to_hub_model_id`: None - `push_to_hub_organization`: None - `mp_parameters`: - `auto_find_batch_size`: False - `full_determinism`: False - `torchdynamo`: None - `ray_scope`: last - `ddp_timeout`: 1800 - `torch_compile`: False - `torch_compile_backend`: None - `torch_compile_mode`: None - `dispatch_batches`: None - `split_batches`: None - `include_tokens_per_second`: False - `include_num_input_tokens_seen`: False - `neftune_noise_alpha`: None - `optim_target_modules`: None - `batch_eval_metrics`: False - `eval_on_start`: False - `use_liger_kernel`: False - `eval_use_gather_object`: False - `average_tokens_across_devices`: False - `prompts`: None - `batch_sampler`: no_duplicates - `multi_dataset_batch_sampler`: proportional
### Training Logs | Epoch | Step | Training Loss | dim_768_cosine_ndcg@10 | dim_512_cosine_ndcg@10 | dim_256_cosine_ndcg@10 | dim_128_cosine_ndcg@10 | dim_64_cosine_ndcg@10 | |:-------:|:-------:|:-------------:|:----------------------:|:----------------------:|:----------------------:|:----------------------:|:---------------------:| | 0.0549 | 10 | 2.6704 | - | - | - | - | - | | 0.1099 | 20 | 1.7246 | - | - | - | - | - | | 0.1648 | 30 | 1.3634 | - | - | - | - | - | | 0.2198 | 40 | 1.0962 | - | - | - | - | - | | 0.2747 | 50 | 0.8985 | - | - | - | - | - | | 0.3297 | 60 | 0.8667 | - | - | - | - | - | | 0.3846 | 70 | 0.7371 | - | - | - | - | - | | 0.4396 | 80 | 1.038 | - | - | - | - | - | | 0.4945 | 90 | 0.733 | - | - | - | - | - | | 0.5495 | 100 | 0.9032 | - | - | - | - | - | | 0.6044 | 110 | 0.7283 | - | - | - | - | - | | 0.6593 | 120 | 0.6085 | - | - | - | - | - | | 0.7143 | 130 | 0.5774 | - | - | - | - | - | | 0.7692 | 140 | 0.6164 | - | - | - | - | - | | 0.8242 | 150 | 0.8098 | - | - | - | - | - | | 0.8791 | 160 | 0.6534 | - | - | - | - | - | | 0.9341 | 170 | 0.6035 | - | - | - | - | - | | 0.9890 | 180 | 0.5209 | - | - | - | - | - | | 1.0 | 182 | - | 0.6911 | 0.6719 | 0.6341 | 0.5600 | 0.4203 | | 1.0440 | 190 | 0.3718 | - | - | - | - | - | | 1.0989 | 200 | 0.2309 | - | - | - | - | - | | 1.1538 | 210 | 0.2128 | - | - | - | - | - | | 1.2088 | 220 | 0.138 | - | - | - | - | - | | 1.2637 | 230 | 0.1129 | - | - | - | - | - | | 1.3187 | 240 | 0.0889 | - | - | - | - | - | | 1.3736 | 250 | 0.0607 | - | - | - | - | - | | 1.4286 | 260 | 0.1156 | - | - | - | - | - | | 1.4835 | 270 | 0.0826 | - | - | - | - | - | | 1.5385 | 280 | 0.098 | - | - | - | - | - | | 1.5934 | 290 | 0.0891 | - | - | - | - | - | | 1.6484 | 300 | 0.0451 | - | - | - | - | - | | 1.7033 | 310 | 0.0581 | - | - | - | - | - | | 1.7582 | 320 | 0.0722 | - | - | - | - | - | | 1.8132 | 330 | 0.0785 | - | - | - | - | - | | 1.8681 | 340 | 0.1407 | - | - | - | - | - | | 1.9231 | 350 | 0.1022 | - | - | - | - | - | | 1.9780 | 360 | 0.0771 | - | - | - | - | - | | **2.0** | **364** | **-** | **0.6787** | **0.6665** | **0.6436** | **0.5742** | **0.4412** | * The bold row denotes the saved checkpoint. ### Framework Versions - Python: 3.10.12 - Sentence Transformers: 3.3.1 - Transformers: 4.47.0 - PyTorch: 2.3.1+cu121 - Accelerate: 1.2.1 - Datasets: 3.3.1 - Tokenizers: 0.21.0 ## Citation ### BibTeX #### Sentence Transformers ```bibtex @inproceedings{reimers-2019-sentence-bert, title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks", author = "Reimers, Nils and Gurevych, Iryna", booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing", month = "11", year = "2019", publisher = "Association for Computational Linguistics", url = "https://arxiv.org/abs/1908.10084", } ``` #### MatryoshkaLoss ```bibtex @misc{kusupati2024matryoshka, title={Matryoshka Representation Learning}, author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi}, year={2024}, eprint={2205.13147}, archivePrefix={arXiv}, primaryClass={cs.LG} } ``` #### MultipleNegativesRankingLoss ```bibtex @misc{henderson2017efficient, title={Efficient Natural Language Response Suggestion for Smart Reply}, author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil}, year={2017}, eprint={1705.00652}, archivePrefix={arXiv}, primaryClass={cs.CL} } ```