SentenceTransformer based on sentence-transformers/all-distilroberta-v1

This is a sentence-transformers model finetuned from sentence-transformers/all-distilroberta-v1. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: RobertaModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("hanwenzhu/all-distilroberta-v1-lr2e-4-bs1024-nneg3-ml-feb22")
# Run inference
sentences = [
    'Mathlib.Analysis.Convex.StoneSeparation#0',
    'Nat.le_of_lt',
    'AddCommMonoid.nat_isScalarTower',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Training Details

Training Dataset

Unnamed Dataset

  • Size: 5,702,228 training samples
  • Columns: state_name and premise_name
  • Approximate statistics based on the first 1000 samples:
    state_name premise_name
    type string string
    details
    • min: 12 tokens
    • mean: 17.86 tokens
    • max: 23 tokens
    • min: 3 tokens
    • mean: 10.93 tokens
    • max: 36 tokens
  • Samples:
    state_name premise_name
    Mathlib.Topology.EMetricSpace.BoundedVariation#253 Set.union_empty
    Mathlib.Topology.EMetricSpace.BoundedVariation#253 le_refl
    Mathlib.Topology.EMetricSpace.BoundedVariation#253 le_of_le_of_eq
  • Loss: loss.MaskedCachedMultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    

Evaluation Dataset

Unnamed Dataset

  • Size: 2,334 evaluation samples
  • Columns: state_name and premise_name
  • Approximate statistics based on the first 1000 samples:
    state_name premise_name
    type string string
    details
    • min: 11 tokens
    • mean: 16.59 tokens
    • max: 24 tokens
    • min: 3 tokens
    • mean: 11.75 tokens
    • max: 32 tokens
  • Samples:
    state_name premise_name
    Mathlib.Algebra.Algebra.Operations#96 Submodule.le_pow_toAddSubmonoid
    Mathlib.Algebra.Algebra.Operations#96 AddSubmonoid.pow_subset_pow
    Mathlib.Algebra.Algebra.Operations#96 trans
  • Loss: loss.MaskedCachedMultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 1024
  • per_device_eval_batch_size: 64
  • learning_rate: 0.0002
  • num_train_epochs: 1.0
  • lr_scheduler_type: cosine
  • warmup_ratio: 0.03
  • bf16: True
  • dataloader_num_workers: 4
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 1024
  • per_device_eval_batch_size: 64
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 0.0002
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 1.0
  • max_steps: -1
  • lr_scheduler_type: cosine
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.03
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: True
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 4
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional

Training Logs

Click to expand
Epoch Step Training Loss loss
0.0018 10 6.4945 -
0.0036 20 5.9953 -
0.0054 30 5.6726 -
0.0072 40 5.489 -
0.0090 50 5.3066 -
0.0101 56 - 1.4491
0.0108 60 5.1176 -
0.0126 70 5.0673 -
0.0144 80 5.0028 -
0.0162 90 4.957 -
0.0180 100 4.8704 -
0.0198 110 4.8362 -
0.0201 112 - 1.2623
0.0215 120 4.7757 -
0.0233 130 4.6646 -
0.0251 140 4.6617 -
0.0269 150 4.6957 -
0.0287 160 4.5359 -
0.0302 168 - 1.2172
0.0305 170 4.5352 -
0.0323 180 4.4969 -
0.0341 190 4.484 -
0.0359 200 4.4936 -
0.0377 210 4.3855 -
0.0395 220 4.3338 -
0.0402 224 - 1.2096
0.0413 230 4.3023 -
0.0431 240 4.3158 -
0.0449 250 4.291 -
0.0467 260 4.2303 -
0.0485 270 4.2196 -
0.0503 280 4.237 1.1234
0.0521 290 4.2183 -
0.0539 300 4.1804 -
0.0557 310 4.1496 -
0.0575 320 4.1086 -
0.0593 330 4.0588 -
0.0603 336 - 0.9823
0.0611 340 4.0566 -
0.0628 350 4.0886 -
0.0646 360 4.126 -
0.0664 370 3.9956 -
0.0682 380 4.0245 -
0.0700 390 4.0398 -
0.0704 392 - 0.9728
0.0718 400 3.9756 -
0.0736 410 4.0221 -
0.0754 420 3.977 -
0.0772 430 3.8922 -
0.0790 440 3.9496 -
0.0804 448 - 0.9045
0.0808 450 3.8841 -
0.0826 460 3.8596 -
0.0844 470 3.8682 -
0.0862 480 3.8671 -
0.0880 490 3.829 -
0.0898 500 3.7833 -
0.0905 504 - 0.8283
0.0916 510 3.7498 -
0.0934 520 3.8393 -
0.0952 530 3.7889 -
0.0970 540 3.798 -
0.0988 550 3.7653 -
0.1006 560 3.7703 0.8599
0.1024 570 3.7121 -
0.1041 580 3.7443 -
0.1059 590 3.7417 -
0.1077 600 3.6768 -
0.1095 610 3.6305 -
0.1106 616 - 0.8878
0.1113 620 3.6516 -
0.1131 630 3.6454 -
0.1149 640 3.6808 -
0.1167 650 3.662 -
0.1185 660 3.6466 -
0.1203 670 3.5734 -
0.1207 672 - 0.8875
0.1221 680 3.5605 -
0.1239 690 3.6263 -
0.1257 700 3.6224 -
0.1275 710 3.5387 -
0.1293 720 3.5355 -
0.1307 728 - 0.8349
0.1311 730 3.5723 -
0.1329 740 3.5018 -
0.1347 750 3.4456 -
0.1365 760 3.4415 -
0.1383 770 3.4535 -
0.1401 780 3.4423 -
0.1408 784 - 0.8113
0.1419 790 3.5116 -
0.1437 800 3.4681 -
0.1454 810 3.4181 -
0.1472 820 3.4289 -
0.1490 830 3.4553 -
0.1508 840 3.4506 0.8186
0.1526 850 3.4006 -
0.1544 860 3.4412 -
0.1562 870 3.3971 -
0.1580 880 3.3829 -
0.1598 890 3.4066 -
0.1609 896 - 0.7989
0.1616 900 3.4174 -
0.1634 910 3.3869 -
0.1652 920 3.3616 -
0.1670 930 3.3639 -
0.1688 940 3.3353 -
0.1706 950 3.3401 -
0.1709 952 - 0.7703
0.1724 960 3.3322 -
0.1742 970 3.3129 -
0.1760 980 3.3336 -
0.1778 990 3.2899 -
0.1796 1000 3.3012 -
0.1810 1008 - 0.7533
0.1814 1010 3.2885 -
0.1832 1020 3.2861 -
0.1850 1030 3.2935 -
0.1867 1040 3.3401 -
0.1885 1050 3.3192 -
0.1903 1060 3.306 -
0.1911 1064 - 0.7385
0.1921 1070 3.2599 -
0.1939 1080 3.1642 -
0.1957 1090 3.2544 -
0.1975 1100 3.1976 -
0.1993 1110 3.1664 -
0.2011 1120 3.1119 0.7099
0.2029 1130 3.1349 -
0.2047 1140 3.2138 -
0.2065 1150 3.2007 -
0.2083 1160 3.1433 -
0.2101 1170 3.1061 -
0.2112 1176 - 0.7260
0.2119 1180 3.1275 -
0.2137 1190 3.1019 -
0.2155 1200 3.1205 -
0.2173 1210 3.0568 -
0.2191 1220 3.1019 -
0.2209 1230 3.1172 -
0.2212 1232 - 0.7232
0.2227 1240 3.0902 -
0.2245 1250 3.0309 -
0.2263 1260 3.0369 -
0.2280 1270 3.0152 -
0.2298 1280 3.0631 -
0.2313 1288 - 0.6834
0.2316 1290 3.0995 -
0.2334 1300 3.0935 -
0.2352 1310 3.0539 -
0.2370 1320 3.0385 -
0.2388 1330 3.0614 -
0.2406 1340 3.0869 -
0.2413 1344 - 0.7055
0.2424 1350 3.0854 -
0.2442 1360 3.0363 -
0.2460 1370 3.0643 -
0.2478 1380 3.0698 -
0.2496 1390 3.0005 -
0.2514 1400 2.9856 0.6682
0.2532 1410 3.0242 -
0.2550 1420 3.0012 -
0.2568 1430 3.0131 -
0.2586 1440 3.0069 -
0.2604 1450 2.9781 -
0.2614 1456 - 0.6871
0.2622 1460 2.9552 -
0.2640 1470 2.9734 -
0.2658 1480 2.9974 -
0.2676 1490 2.9739 -
0.2693 1500 2.9154 -
0.2711 1510 2.9461 -
0.2715 1512 - 0.6957
0.2729 1520 2.8891 -
0.2747 1530 2.9345 -
0.2765 1540 2.9421 -
0.2783 1550 2.9024 -
0.2801 1560 2.9436 -
0.2816 1568 - 0.6855
0.2819 1570 2.9584 -
0.2837 1580 2.9022 -
0.2855 1590 2.8767 -
0.2873 1600 2.9197 -
0.2891 1610 2.8995 -
0.2909 1620 2.8613 -
0.2916 1624 - 0.6869
0.2927 1630 2.8522 -
0.2945 1640 2.8988 -
0.2963 1650 2.8307 -
0.2981 1660 2.8281 -
0.2999 1670 2.835 -
0.3017 1680 2.8305 0.6352
0.3035 1690 2.8139 -
0.3053 1700 2.8655 -
0.3071 1710 2.8651 -
0.3089 1720 2.8026 -
0.3106 1730 2.7712 -
0.3117 1736 - 0.6213
0.3124 1740 2.8073 -
0.3142 1750 2.7572 -
0.3160 1760 2.7446 -
0.3178 1770 2.7955 -
0.3196 1780 2.7745 -
0.3214 1790 2.7254 -
0.3218 1792 - 0.6358
0.3232 1800 2.7719 -
0.3250 1810 2.7386 -
0.3268 1820 2.705 -
0.3286 1830 2.7102 -
0.3304 1840 2.7694 -
0.3318 1848 - 0.6394
0.3322 1850 2.7433 -
0.3340 1860 2.6986 -
0.3358 1870 2.7005 -
0.3376 1880 2.6814 -
0.3394 1890 2.6811 -
0.3412 1900 2.7303 -
0.3419 1904 - 0.6303
0.3430 1910 2.7674 -
0.3448 1920 2.7573 -
0.3466 1930 2.7488 -
0.3484 1940 2.7408 -
0.3502 1950 2.6989 -
0.3519 1960 2.7066 0.6180
0.3537 1970 2.707 -
0.3555 1980 2.6932 -
0.3573 1990 2.7165 -
0.3591 2000 2.6938 -
0.3609 2010 2.7207 -
0.3620 2016 - 0.5906
0.3627 2020 2.7456 -
0.3645 2030 2.714 -
0.3663 2040 2.6607 -
0.3681 2050 2.6659 -
0.3699 2060 2.6621 -
0.3717 2070 2.6872 -
0.3721 2072 - 0.5879
0.3735 2080 2.6439 -
0.3753 2090 2.6849 -
0.3771 2100 2.6518 -
0.3789 2110 2.5955 -
0.3807 2120 2.6138 -
0.3821 2128 - 0.5945
0.3825 2130 2.5803 -
0.3843 2140 2.6437 -
0.3861 2150 2.6264 -
0.3879 2160 2.5644 -
0.3897 2170 2.5971 -
0.3915 2180 2.52 -
0.3922 2184 - 0.5953
0.3932 2190 2.5523 -
0.3950 2200 2.599 -
0.3968 2210 2.5832 -
0.3986 2220 2.6254 -
0.4004 2230 2.5838 -
0.4022 2240 2.5737 0.5751
0.4040 2250 2.5663 -
0.4058 2260 2.6058 -
0.4076 2270 2.5968 -
0.4094 2280 2.5784 -
0.4112 2290 2.5363 -
0.4123 2296 - 0.5810
0.4130 2300 2.5149 -
0.4148 2310 2.558 -
0.4166 2320 2.5614 -
0.4184 2330 2.5482 -
0.4202 2340 2.5458 -
0.4220 2350 2.5281 -
0.4223 2352 - 0.5673
0.4238 2360 2.5617 -
0.4256 2370 2.5337 -
0.4274 2380 2.5321 -
0.4292 2390 2.5506 -
0.4310 2400 2.5214 -
0.4324 2408 - 0.5650
0.4328 2410 2.5245 -
0.4345 2420 2.5047 -
0.4363 2430 2.5719 -
0.4381 2440 2.512 -
0.4399 2450 2.5076 -
0.4417 2460 2.4517 -
0.4424 2464 - 0.5772
0.4435 2470 2.4911 -
0.4453 2480 2.5638 -
0.4471 2490 2.5349 -
0.4489 2500 2.4961 -
0.4507 2510 2.5169 -
0.4525 2520 2.489 0.5655
0.4543 2530 2.475 -
0.4561 2540 2.4378 -
0.4579 2550 2.4252 -
0.4597 2560 2.4448 -
0.4615 2570 2.4596 -
0.4626 2576 - 0.5544
0.4633 2580 2.4811 -
0.4651 2590 2.4459 -
0.4669 2600 2.4261 -
0.4687 2610 2.4214 -
0.4705 2620 2.4528 -
0.4723 2630 2.4374 -
0.4726 2632 - 0.5336
0.4741 2640 2.4585 -
0.4758 2650 2.4529 -
0.4776 2660 2.4205 -
0.4794 2670 2.441 -
0.4812 2680 2.4654 -
0.4827 2688 - 0.5314
0.4830 2690 2.4535 -
0.4848 2700 2.5085 -
0.4866 2710 2.4725 -
0.4884 2720 2.4655 -
0.4902 2730 2.4137 -
0.4920 2740 2.4172 -
0.4927 2744 - 0.5352
0.4938 2750 2.434 -
0.4956 2760 2.4489 -
0.4974 2770 2.4448 -
0.4992 2780 2.3979 -
0.5010 2790 2.4251 -
0.5028 2800 2.3996 0.5313
0.5046 2810 2.4467 -
0.5064 2820 2.4338 -
0.5082 2830 2.4386 -
0.5100 2840 2.3813 -
0.5118 2850 2.4149 -
0.5128 2856 - 0.5261
0.5136 2860 2.3822 -
0.5154 2870 2.407 -
0.5171 2880 2.3406 -
0.5189 2890 2.3845 -
0.5207 2900 2.3176 -
0.5225 2910 2.3554 -
0.5229 2912 - 0.5172
0.5243 2920 2.3905 -
0.5261 2930 2.3994 -
0.5279 2940 2.4004 -
0.5297 2950 2.3499 -
0.5315 2960 2.3758 -
0.5330 2968 - 0.5340
0.5333 2970 2.3644 -
0.5351 2980 2.3288 -
0.5369 2990 2.3504 -
0.5387 3000 2.2991 -
0.5405 3010 2.3471 -
0.5423 3020 2.3408 -
0.5430 3024 - 0.5077
0.5441 3030 2.3881 -
0.5459 3040 2.3398 -
0.5477 3050 2.2963 -
0.5495 3060 2.3344 -
0.5513 3070 2.3268 -
0.5531 3080 2.3197 0.5025
0.5549 3090 2.3667 -
0.5567 3100 2.3655 -
0.5584 3110 2.3295 -
0.5602 3120 2.3238 -
0.5620 3130 2.3336 -
0.5631 3136 - 0.4885
0.5638 3140 2.3408 -
0.5656 3150 2.3371 -
0.5674 3160 2.3419 -
0.5692 3170 2.2884 -
0.5710 3180 2.2972 -
0.5728 3190 2.2571 -
0.5732 3192 - 0.4772
0.5746 3200 2.2741 -
0.5764 3210 2.3012 -
0.5782 3220 2.3374 -
0.5800 3230 2.2804 -
0.5818 3240 2.2674 -
0.5832 3248 - 0.5104
0.5836 3250 2.277 -
0.5854 3260 2.288 -
0.5872 3270 2.2677 -
0.5890 3280 2.2935 -
0.5908 3290 2.2697 -
0.5926 3300 2.2595 -
0.5933 3304 - 0.4893
0.5944 3310 2.2754 -
0.5962 3320 2.2544 -
0.5980 3330 2.2816 -
0.5997 3340 2.2192 -
0.6015 3350 2.2841 -
0.6033 3360 2.2807 0.4862
0.6051 3370 2.2228 -
0.6069 3380 2.2437 -
0.6087 3390 2.2494 -
0.6105 3400 2.2715 -
0.6123 3410 2.2578 -
0.6134 3416 - 0.4820
0.6141 3420 2.2393 -
0.6159 3430 2.272 -
0.6177 3440 2.24 -
0.6195 3450 2.2612 -
0.6213 3460 2.2369 -
0.6231 3470 2.251 -
0.6235 3472 - 0.4637
0.6249 3480 2.1808 -
0.6267 3490 2.2178 -
0.6285 3500 2.2261 -
0.6303 3510 2.1946 -
0.6321 3520 2.167 -
0.6335 3528 - 0.4657
0.6339 3530 2.1794 -
0.6357 3540 2.1646 -
0.6375 3550 2.2539 -
0.6393 3560 2.2163 -
0.6410 3570 2.2402 -
0.6428 3580 2.1637 -
0.6436 3584 - 0.4676
0.6446 3590 2.1718 -
0.6464 3600 2.1778 -
0.6482 3610 2.2156 -
0.6500 3620 2.2267 -
0.6518 3630 2.2506 -
0.6536 3640 2.1913 0.4698
0.6554 3650 2.2207 -
0.6572 3660 2.1914 -
0.6590 3670 2.2358 -
0.6608 3680 2.213 -
0.6626 3690 2.2178 -
0.6637 3696 - 0.4671
0.6644 3700 2.2003 -
0.6662 3710 2.1846 -
0.6680 3720 2.2418 -
0.6698 3730 2.1752 -
0.6716 3740 2.2026 -
0.6734 3750 2.2094 -
0.6737 3752 - 0.4506
0.6752 3760 2.198 -
0.6770 3770 2.1714 -
0.6788 3780 2.2162 -
0.6806 3790 2.1964 -
0.6823 3800 2.1827 -
0.6838 3808 - 0.4698
0.6841 3810 2.1884 -
0.6859 3820 2.1562 -
0.6877 3830 2.1502 -
0.6895 3840 2.1936 -
0.6913 3850 2.1785 -
0.6931 3860 2.1587 -
0.6938 3864 - 0.4495
0.6949 3870 2.196 -
0.6967 3880 2.1883 -
0.6985 3890 2.1452 -
0.7003 3900 2.1749 -
0.7021 3910 2.219 -
0.7039 3920 2.1916 0.4399
0.7057 3930 2.1197 -
0.7075 3940 2.1504 -
0.7093 3950 2.1144 -
0.7111 3960 2.1299 -
0.7129 3970 2.1704 -
0.7140 3976 - 0.4442
0.7147 3980 2.1874 -
0.7165 3990 2.1853 -
0.7183 4000 2.1954 -
0.7201 4010 2.1971 -
0.7219 4020 2.1675 -
0.7236 4030 2.1777 -
0.7240 4032 - 0.4404
0.7254 4040 2.1521 -
0.7272 4050 2.1615 -
0.7290 4060 2.1736 -
0.7308 4070 2.1394 -
0.7326 4080 2.1352 -
0.7341 4088 - 0.4352
0.7344 4090 2.1618 -
0.7362 4100 2.1351 -
0.7380 4110 2.1216 -
0.7398 4120 2.0994 -
0.7416 4130 2.1209 -
0.7434 4140 2.1436 -
0.7441 4144 - 0.4337
0.7452 4150 2.1139 -
0.7470 4160 2.119 -
0.7488 4170 2.1159 -
0.7506 4180 2.1019 -
0.7524 4190 2.1614 -
0.7542 4200 2.1301 0.4413
0.7560 4210 2.1316 -
0.7578 4220 2.1273 -
0.7596 4230 2.0352 -
0.7614 4240 2.0996 -
0.7632 4250 2.1295 -
0.7642 4256 - 0.4348
0.7649 4260 2.0968 -
0.7667 4270 2.0778 -
0.7685 4280 2.1248 -
0.7703 4290 2.0838 -
0.7721 4300 2.0912 -
0.7739 4310 2.0775 -
0.7743 4312 - 0.4441
0.7757 4320 2.1257 -
0.7775 4330 2.1134 -
0.7793 4340 2.0975 -
0.7811 4350 2.1004 -
0.7829 4360 2.1172 -
0.7843 4368 - 0.4406
0.7847 4370 2.0906 -
0.7865 4380 2.0822 -
0.7883 4390 2.0881 -
0.7901 4400 2.1305 -
0.7919 4410 2.1207 -
0.7937 4420 2.0894 -
0.7944 4424 - 0.4353
0.7955 4430 2.1046 -
0.7973 4440 2.1255 -
0.7991 4450 2.1023 -
0.8009 4460 2.0824 -
0.8027 4470 2.0778 -
0.8045 4480 2.1155 0.4315
0.8062 4490 2.0992 -
0.8080 4500 2.0829 -
0.8098 4510 2.1144 -
0.8116 4520 2.0977 -
0.8134 4530 2.1148 -
0.8145 4536 - 0.4289
0.8152 4540 2.1267 -
0.8170 4550 2.106 -
0.8188 4560 2.0573 -
0.8206 4570 2.0376 -
0.8224 4580 2.1084 -
0.8242 4590 2.0774 -
0.8246 4592 - 0.4270
0.8260 4600 2.1035 -
0.8278 4610 2.1295 -
0.8296 4620 2.1035 -
0.8314 4630 2.118 -
0.8332 4640 2.0951 -
0.8346 4648 - 0.4235
0.8350 4650 2.092 -
0.8368 4660 2.1229 -
0.8386 4670 2.1432 -
0.8404 4680 2.1285 -
0.8422 4690 2.1056 -
0.8440 4700 2.0699 -
0.8447 4704 - 0.4192
0.8458 4710 2.0441 -
0.8475 4720 2.0788 -
0.8493 4730 2.0375 -
0.8511 4740 2.0502 -
0.8529 4750 2.1166 -
0.8547 4760 2.0791 0.4196
0.8565 4770 2.0894 -
0.8583 4780 2.1094 -
0.8601 4790 2.0677 -
0.8619 4800 2.0168 -
0.8637 4810 1.972 -
0.8648 4816 - 0.4219
0.8655 4820 2.0104 -
0.8673 4830 1.7668 -
0.8691 4840 1.6258 -
0.8709 4850 1.779 -
0.8727 4860 1.7525 -
0.8745 4870 1.9056 -
0.8748 4872 - 0.4488
0.8763 4880 1.8157 -
0.8781 4890 1.893 -
0.8799 4900 1.9266 -
0.8817 4910 1.8851 -
0.8835 4920 1.9342 -
0.8849 4928 - 0.5160
0.8853 4930 1.8377 -
0.8871 4940 1.873 -
0.8888 4950 1.8302 -
0.8906 4960 1.9123 -
0.8924 4970 1.8605 -
0.8942 4980 1.878 -
0.8950 4984 - 0.5648
0.8960 4990 1.8416 -
0.8978 5000 1.9061 -
0.8996 5010 1.8084 -
0.9014 5020 1.8982 -
0.9032 5030 1.9167 -
0.9050 5040 1.8795 0.6082
0.9068 5050 1.9449 -
0.9086 5060 1.956 -
0.9104 5070 1.8469 -
0.9122 5080 1.8858 -
0.9140 5090 1.8 -
0.9151 5096 - 0.6452
0.9158 5100 1.7873 -
0.9176 5110 1.7998 -
0.9194 5120 1.9032 -
0.9212 5130 1.8753 -
0.9230 5140 1.8959 -
0.9248 5150 1.7677 -
0.9251 5152 - 0.6645
0.9266 5160 1.8726 -
0.9284 5170 1.8311 -
0.9301 5180 1.8198 -
0.9319 5190 1.8422 -
0.9337 5200 1.8419 -
0.9352 5208 - 0.6856
0.9355 5210 1.7987 -
0.9373 5220 1.8164 -
0.9391 5230 1.7429 -
0.9409 5240 1.8444 -
0.9427 5250 1.8373 -
0.9445 5260 1.7414 -
0.9452 5264 - 0.7004
0.9463 5270 1.8996 -
0.9481 5280 1.821 -
0.9499 5290 1.8124 -
0.9517 5300 1.7433 -
0.9535 5310 1.8208 -
0.9553 5320 1.826 0.7103
0.9571 5330 1.8108 -
0.9589 5340 1.8068 -
0.9607 5350 1.8513 -
0.9625 5360 1.8312 -
0.9643 5370 1.8248 -
0.9653 5376 - 0.7145
0.9661 5380 1.8556 -
0.9679 5390 1.8554 -
0.9697 5400 1.7885 -
0.9714 5410 1.7767 -
0.9732 5420 1.8356 -
0.9750 5430 1.7998 -
0.9754 5432 - 0.7178
0.9768 5440 1.8958 -
0.9786 5450 1.8307 -
0.9804 5460 1.7892 -
0.9822 5470 1.823 -
0.9840 5480 1.8135 -
0.9855 5488 - 0.7201
0.9858 5490 1.7887 -
0.9876 5500 1.8096 -
0.9894 5510 1.8686 -
0.9912 5520 1.8398 -
0.9930 5530 1.9189 -
0.9948 5540 1.689 -
0.9955 5544 - 0.7204
0.9966 5550 1.8621 -
0.9984 5560 1.8037 -

Framework Versions

  • Python: 3.11.8
  • Sentence Transformers: 3.1.1
  • Transformers: 4.45.1
  • PyTorch: 2.4.0+cu121
  • Accelerate: 0.34.2
  • Datasets: 3.0.0
  • Tokenizers: 0.20.0

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MaskedCachedMultipleNegativesRankingLoss

@misc{gao2021scaling,
    title={Scaling Deep Contrastive Learning Batch Size under Memory Limited Setup},
    author={Luyu Gao and Yunyi Zhang and Jiawei Han and Jamie Callan},
    year={2021},
    eprint={2101.06983},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}
Downloads last month
23
Safetensors
Model size
82.1M params
Tensor type
F32
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.

Model tree for hanwenzhu/all-distilroberta-v1-lr2e-4-bs1024-nneg3-ml-feb22

Finetuned
(18)
this model