SentenceTransformer based on CocoRoF/ModernBERT-SimCSE-multitask_v03-retry

This is a sentence-transformers model finetuned from CocoRoF/ModernBERT-SimCSE-multitask_v03-retry on the misc_sts_pairs_v2_kor_kosimcse dataset. It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 2048, 'do_lower_case': False}) with Transformer model: ModernBertModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Dense({'in_features': 768, 'out_features': 1024, 'bias': True, 'activation_function': 'torch.nn.modules.activation.Tanh'})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("CocoRoF/ModernBERT-SimCSE-multitask_v03-distill")
# Run inference
sentences = [
    '버스가 바쁜 길을 따라 운전한다.',
    '녹색 버스가 도로를 따라 내려간다.',
    '그 여자는 데이트하러 가는 중이다.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 1024]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Semantic Similarity

Metric Value
pearson_cosine 0.8221
spearman_cosine 0.8282
pearson_euclidean 0.7929
spearman_euclidean 0.798
pearson_manhattan 0.7937
spearman_manhattan 0.7997
pearson_dot 0.7011
spearman_dot 0.6845
pearson_max 0.8221
spearman_max 0.8282

Training Details

Training Dataset

misc_sts_pairs_v2_kor_kosimcse

  • Dataset: misc_sts_pairs_v2_kor_kosimcse at e747415
  • Size: 449,904 training samples
  • Columns: sentence1, sentence2, and score
  • Approximate statistics based on the first 1000 samples:
    sentence1 sentence2 score
    type string string float
    details
    • min: 6 tokens
    • mean: 18.3 tokens
    • max: 69 tokens
    • min: 6 tokens
    • mean: 18.69 tokens
    • max: 66 tokens
    • min: 0.11
    • mean: 0.77
    • max: 1.0
  • Samples:
    sentence1 sentence2 score
    주홍글씨는 언제 출판되었습니까? 《주홍글씨》는 몇 년에 출판되었습니까? 0.8638778924942017
    폴란드에서 빨간색과 흰색은 무엇을 의미합니까? 폴란드 국기의 색상은 무엇입니까? 0.6773715019226074
    노르만인들은 방어를 위해 모트와 베일리 성을 어떻게 사용했는가? 11세기에는 어떻게 모트와 베일리 성을 만들었습니까? 0.7460665702819824
  • Loss: CosineSimilarityLoss with these parameters:
    {
        "loss_fct": "torch.nn.modules.loss.MSELoss"
    }
    

Evaluation Dataset

Unnamed Dataset

  • Size: 1,500 evaluation samples
  • Columns: sentence1, sentence2, and score
  • Approximate statistics based on the first 1000 samples:
    sentence1 sentence2 score
    type string string float
    details
    • min: 7 tokens
    • mean: 20.38 tokens
    • max: 52 tokens
    • min: 6 tokens
    • mean: 20.52 tokens
    • max: 54 tokens
    • min: 0.0
    • mean: 0.42
    • max: 1.0
  • Samples:
    sentence1 sentence2 score
    안전모를 가진 한 남자가 춤을 추고 있다. 안전모를 쓴 한 남자가 춤을 추고 있다. 1.0
    어린아이가 말을 타고 있다. 아이가 말을 타고 있다. 0.95
    한 남자가 뱀에게 쥐를 먹이고 있다. 남자가 뱀에게 쥐를 먹이고 있다. 1.0
  • Loss: CosineSimilarityLoss with these parameters:
    {
        "loss_fct": "torch.nn.modules.loss.MSELoss"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • overwrite_output_dir: True
  • eval_strategy: steps
  • gradient_accumulation_steps: 16
  • learning_rate: 8e-05
  • num_train_epochs: 10.0
  • warmup_ratio: 0.2
  • push_to_hub: True
  • hub_model_id: CocoRoF/ModernBERT-SimCSE-multitask_v03-distill
  • hub_strategy: checkpoint
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: True
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 8
  • per_device_eval_batch_size: 8
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 16
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 8e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 10.0
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.2
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: True
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: True
  • resume_from_checkpoint: None
  • hub_model_id: CocoRoF/ModernBERT-SimCSE-multitask_v03-distill
  • hub_strategy: checkpoint
  • hub_private_repo: None
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional

Training Logs

Click to expand
Epoch Step Training Loss Validation Loss sts_dev_spearman_max
0.0228 10 0.3524 - -
0.0455 20 0.3496 - -
0.0683 30 0.3515 - -
0.0911 40 0.348 - -
0.1138 50 0.3409 - -
0.1366 60 0.347 - -
0.1593 70 0.3377 - -
0.1821 80 0.3317 - -
0.2049 90 0.3279 - -
0.2276 100 0.3264 - -
0.2504 110 0.3116 - -
0.2732 120 0.3055 - -
0.2959 130 0.3042 - -
0.3187 140 0.2928 - -
0.3414 150 0.2835 - -
0.3642 160 0.2665 - -
0.3870 170 0.2665 - -
0.4097 180 0.2486 - -
0.4325 190 0.2387 - -
0.4553 200 0.2283 - -
0.4780 210 0.2237 - -
0.5008 220 0.2204 - -
0.5235 230 0.205 - -
0.5463 240 0.2002 - -
0.5691 250 0.1904 0.0330 0.7921
0.5918 260 0.1834 - -
0.6146 270 0.1776 - -
0.6374 280 0.1665 - -
0.6601 290 0.1625 - -
0.6829 300 0.1585 - -
0.7056 310 0.1522 - -
0.7284 320 0.1552 - -
0.7512 330 0.1448 - -
0.7739 340 0.1428 - -
0.7967 350 0.1401 - -
0.8195 360 0.1399 - -
0.8422 370 0.1389 - -
0.8650 380 0.1372 - -
0.8878 390 0.1338 - -
0.9105 400 0.1361 - -
0.9333 410 0.1389 - -
0.9560 420 0.1328 - -
0.9788 430 0.1375 - -
1.0 440 0.1266 - -
1.0228 450 0.1269 - -
1.0455 460 0.1262 - -
1.0683 470 0.127 - -
1.0911 480 0.1306 - -
1.1138 490 0.1266 - -
1.1366 500 0.1247 0.0405 0.7995
1.1593 510 0.1258 - -
1.1821 520 0.1277 - -
1.2049 530 0.13 - -
1.2276 540 0.1291 - -
1.2504 550 0.1287 - -
1.2732 560 0.1233 - -
1.2959 570 0.1242 - -
1.3187 580 0.1242 - -
1.3414 590 0.1227 - -
1.3642 600 0.1201 - -
1.3870 610 0.1247 - -
1.4097 620 0.1249 - -
1.4325 630 0.1213 - -
1.4553 640 0.1217 - -
1.4780 650 0.1204 - -
1.5008 660 0.1191 - -
1.5235 670 0.1163 - -
1.5463 680 0.1171 - -
1.5691 690 0.1208 - -
1.5918 700 0.1194 - -
1.6146 710 0.1173 - -
1.6374 720 0.1177 - -
1.6601 730 0.1148 - -
1.6829 740 0.1134 - -
1.7056 750 0.1167 0.0422 0.8092
1.7284 760 0.1145 - -
1.7512 770 0.114 - -
1.7739 780 0.1136 - -
1.7967 790 0.1123 - -
1.8195 800 0.1115 - -
1.8422 810 0.1127 - -
1.8650 820 0.1137 - -
1.8878 830 0.1137 - -
1.9105 840 0.1123 - -
1.9333 850 0.1115 - -
1.9560 860 0.1105 - -
1.9788 870 0.1133 - -
2.0 880 0.1049 - -
2.0228 890 0.1091 - -
2.0455 900 0.111 - -
2.0683 910 0.1101 - -
2.0911 920 0.1078 - -
2.1138 930 0.1097 - -
2.1366 940 0.108 - -
2.1593 950 0.1077 - -
2.1821 960 0.1087 - -
2.2049 970 0.1058 - -
2.2276 980 0.1071 - -
2.2504 990 0.1058 - -
2.2732 1000 0.1104 0.0434 0.8156
2.2959 1010 0.1036 - -
2.3187 1020 0.1068 - -
2.3414 1030 0.1033 - -
2.3642 1040 0.1058 - -
2.3870 1050 0.105 - -
2.4097 1060 0.1052 - -
2.4325 1070 0.1013 - -
2.4553 1080 0.1037 - -
2.4780 1090 0.1031 - -
2.5008 1100 0.1057 - -
2.5235 1110 0.1051 - -
2.5463 1120 0.1019 - -
2.5691 1130 0.1018 - -
2.5918 1140 0.1007 - -
2.6146 1150 0.1035 - -
2.6374 1160 0.1032 - -
2.6601 1170 0.1036 - -
2.6829 1180 0.0971 - -
2.7056 1190 0.1015 - -
2.7284 1200 0.104 - -
2.7512 1210 0.1007 - -
2.7739 1220 0.102 - -
2.7967 1230 0.0994 - -
2.8195 1240 0.0972 - -
2.8422 1250 0.0969 0.0437 0.8185
2.8650 1260 0.0968 - -
2.8878 1270 0.1003 - -
2.9105 1280 0.1036 - -
2.9333 1290 0.0969 - -
2.9560 1300 0.0965 - -
2.9788 1310 0.0974 - -
3.0 1320 0.0905 - -
3.0228 1330 0.1006 - -
3.0455 1340 0.0952 - -
3.0683 1350 0.0971 - -
3.0911 1360 0.0943 - -
3.1138 1370 0.0996 - -
3.1366 1380 0.0971 - -
3.1593 1390 0.097 - -
3.1821 1400 0.0937 - -
3.2049 1410 0.0955 - -
3.2276 1420 0.0963 - -
3.2504 1430 0.0938 - -
3.2732 1440 0.0986 - -
3.2959 1450 0.0949 - -
3.3187 1460 0.0932 - -
3.3414 1470 0.096 - -
3.3642 1480 0.0919 - -
3.3870 1490 0.093 - -
3.4097 1500 0.0925 0.0438 0.8201
3.4325 1510 0.0935 - -
3.4553 1520 0.0928 - -
3.4780 1530 0.0914 - -
3.5008 1540 0.0912 - -
3.5235 1550 0.091 - -
3.5463 1560 0.0906 - -
3.5691 1570 0.0936 - -
3.5918 1580 0.0943 - -
3.6146 1590 0.0925 - -
3.6374 1600 0.0908 - -
3.6601 1610 0.0933 - -
3.6829 1620 0.0917 - -
3.7056 1630 0.0887 - -
3.7284 1640 0.0903 - -
3.7512 1650 0.0934 - -
3.7739 1660 0.0906 - -
3.7967 1670 0.0886 - -
3.8195 1680 0.0915 - -
3.8422 1690 0.0924 - -
3.8650 1700 0.094 - -
3.8878 1710 0.0899 - -
3.9105 1720 0.0881 - -
3.9333 1730 0.0884 - -
3.9560 1740 0.0894 - -
3.9788 1750 0.0892 0.0441 0.8215
4.0 1760 0.0812 - -
4.0228 1770 0.0878 - -
4.0455 1780 0.0869 - -
4.0683 1790 0.09 - -
4.0911 1800 0.0875 - -
4.1138 1810 0.086 - -
4.1366 1820 0.0888 - -
4.1593 1830 0.086 - -
4.1821 1840 0.0869 - -
4.2049 1850 0.0885 - -
4.2276 1860 0.0891 - -
4.2504 1870 0.0853 - -
4.2732 1880 0.0849 - -
4.2959 1890 0.0856 - -
4.3187 1900 0.0863 - -
4.3414 1910 0.0849 - -
4.3642 1920 0.0855 - -
4.3870 1930 0.0841 - -
4.4097 1940 0.0893 - -
4.4325 1950 0.0847 - -
4.4553 1960 0.0866 - -
4.4780 1970 0.0866 - -
4.5008 1980 0.0844 - -
4.5235 1990 0.0846 - -
4.5463 2000 0.0847 0.0435 0.8220
4.5691 2010 0.0831 - -
4.5918 2020 0.0843 - -
4.6146 2030 0.086 - -
4.6374 2040 0.0851 - -
4.6601 2050 0.0844 - -
4.6829 2060 0.0843 - -
4.7056 2070 0.0854 - -
4.7284 2080 0.0851 - -
4.7512 2090 0.0822 - -
4.7739 2100 0.0859 - -
4.7967 2110 0.0844 - -
4.8195 2120 0.0853 - -
4.8422 2130 0.0815 - -
4.8650 2140 0.0833 - -
4.8878 2150 0.0817 - -
4.9105 2160 0.0873 - -
4.9333 2170 0.0813 - -
4.9560 2180 0.0829 - -
4.9788 2190 0.0812 - -
5.0 2200 0.0776 - -
5.0228 2210 0.083 - -
5.0455 2220 0.0821 - -
5.0683 2230 0.0806 - -
5.0911 2240 0.0809 - -
5.1138 2250 0.0814 0.0431 0.8225
5.1366 2260 0.0808 - -
5.1593 2270 0.0791 - -
5.1821 2280 0.0811 - -
5.2049 2290 0.0805 - -
5.2276 2300 0.0817 - -
5.2504 2310 0.0772 - -
5.2732 2320 0.0799 - -
5.2959 2330 0.0829 - -
5.3187 2340 0.077 - -
5.3414 2350 0.0801 - -
5.3642 2360 0.0812 - -
5.3870 2370 0.0788 - -
5.4097 2380 0.0776 - -
5.4325 2390 0.0785 - -
5.4553 2400 0.0771 - -
5.4780 2410 0.0788 - -
5.5008 2420 0.0796 - -
5.5235 2430 0.0793 - -
5.5463 2440 0.0813 - -
5.5691 2450 0.0757 - -
5.5918 2460 0.079 - -
5.6146 2470 0.0797 - -
5.6374 2480 0.0794 - -
5.6601 2490 0.0808 - -
5.6829 2500 0.0796 0.0424 0.8230
5.7056 2510 0.0802 - -
5.7284 2520 0.0799 - -
5.7512 2530 0.0802 - -
5.7739 2540 0.0813 - -
5.7967 2550 0.0772 - -
5.8195 2560 0.0766 - -
5.8422 2570 0.0778 - -
5.8650 2580 0.076 - -
5.8878 2590 0.0787 - -
5.9105 2600 0.0794 - -
5.9333 2610 0.076 - -
5.9560 2620 0.0773 - -
5.9788 2630 0.0755 - -
6.0 2640 0.0725 - -
6.0228 2650 0.0738 - -
6.0455 2660 0.0762 - -
6.0683 2670 0.0761 - -
6.0911 2680 0.0771 - -
6.1138 2690 0.0765 - -
6.1366 2700 0.0755 - -
6.1593 2710 0.0771 - -
6.1821 2720 0.0748 - -
6.2049 2730 0.0768 - -
6.2276 2740 0.0766 - -
6.2504 2750 0.0766 0.0422 0.8239
6.2732 2760 0.076 - -
6.2959 2770 0.0753 - -
6.3187 2780 0.0735 - -
6.3414 2790 0.0751 - -
6.3642 2800 0.0738 - -
6.3870 2810 0.0749 - -
6.4097 2820 0.0753 - -
6.4325 2830 0.077 - -
6.4553 2840 0.0747 - -
6.4780 2850 0.0722 - -
6.5008 2860 0.0736 - -
6.5235 2870 0.073 - -
6.5463 2880 0.0774 - -
6.5691 2890 0.075 - -
6.5918 2900 0.0718 - -
6.6146 2910 0.0727 - -
6.6374 2920 0.0735 - -
6.6601 2930 0.0726 - -
6.6829 2940 0.075 - -
6.7056 2950 0.0728 - -
6.7284 2960 0.0713 - -
6.7512 2970 0.0722 - -
6.7739 2980 0.0753 - -
6.7967 2990 0.0733 - -
6.8195 3000 0.0727 0.0425 0.8243
6.8422 3010 0.0729 - -
6.8650 3020 0.073 - -
6.8878 3030 0.0739 - -
6.9105 3040 0.0717 - -
6.9333 3050 0.0719 - -
6.9560 3060 0.0712 - -
6.9788 3070 0.0712 - -
7.0 3080 0.0674 - -
7.0228 3090 0.0729 - -
7.0455 3100 0.0712 - -
7.0683 3110 0.0701 - -
7.0911 3120 0.0699 - -
7.1138 3130 0.0675 - -
7.1366 3140 0.0699 - -
7.1593 3150 0.0716 - -
7.1821 3160 0.0707 - -
7.2049 3170 0.0717 - -
7.2276 3180 0.0709 - -
7.2504 3190 0.071 - -
7.2732 3200 0.0722 - -
7.2959 3210 0.072 - -
7.3187 3220 0.0729 - -
7.3414 3230 0.0678 - -
7.3642 3240 0.0705 - -
7.3870 3250 0.0715 0.0426 0.8256
7.4097 3260 0.0703 - -
7.4325 3270 0.0699 - -
7.4553 3280 0.071 - -
7.4780 3290 0.0692 - -
7.5008 3300 0.0693 - -
7.5235 3310 0.0661 - -
7.5463 3320 0.0702 - -
7.5691 3330 0.0697 - -
7.5918 3340 0.072 - -
7.6146 3350 0.0693 - -
7.6374 3360 0.0691 - -
7.6601 3370 0.0702 - -
7.6829 3380 0.0672 - -
7.7056 3390 0.0698 - -
7.7284 3400 0.0687 - -
7.7512 3410 0.0654 - -
7.7739 3420 0.0687 - -
7.7967 3430 0.0679 - -
7.8195 3440 0.0713 - -
7.8422 3450 0.0676 - -
7.8650 3460 0.0708 - -
7.8878 3470 0.0666 - -
7.9105 3480 0.0675 - -
7.9333 3490 0.0693 - -
7.9560 3500 0.0688 0.0427 0.8260
7.9788 3510 0.068 - -
8.0 3520 0.063 - -
8.0228 3530 0.0659 - -
8.0455 3540 0.0639 - -
8.0683 3550 0.0678 - -
8.0911 3560 0.0689 - -
8.1138 3570 0.0687 - -
8.1366 3580 0.0672 - -
8.1593 3590 0.0659 - -
8.1821 3600 0.0658 - -
8.2049 3610 0.0664 - -
8.2276 3620 0.0659 - -
8.2504 3630 0.0664 - -
8.2732 3640 0.0652 - -
8.2959 3650 0.0683 - -
8.3187 3660 0.0641 - -
8.3414 3670 0.0672 - -
8.3642 3680 0.0655 - -
8.3870 3690 0.0661 - -
8.4097 3700 0.0638 - -
8.4325 3710 0.0675 - -
8.4553 3720 0.0648 - -
8.4780 3730 0.067 - -
8.5008 3740 0.0684 - -
8.5235 3750 0.0667 0.0420 0.8268
8.5463 3760 0.0645 - -
8.5691 3770 0.0652 - -
8.5918 3780 0.0633 - -
8.6146 3790 0.065 - -
8.6374 3800 0.064 - -
8.6601 3810 0.0677 - -
8.6829 3820 0.0661 - -
8.7056 3830 0.0653 - -
8.7284 3840 0.0625 - -
8.7512 3850 0.0651 - -
8.7739 3860 0.0656 - -
8.7967 3870 0.0636 - -
8.8195 3880 0.0655 - -
8.8422 3890 0.0647 - -
8.8650 3900 0.0638 - -
8.8878 3910 0.0636 - -
8.9105 3920 0.0666 - -
8.9333 3930 0.062 - -
8.9560 3940 0.065 - -
8.9788 3950 0.0643 - -
9.0 3960 0.0594 - -
9.0228 3970 0.0616 - -
9.0455 3980 0.0638 - -
9.0683 3990 0.0625 - -
9.0911 4000 0.0665 0.0414 0.8276
9.1138 4010 0.0624 - -
9.1366 4020 0.0621 - -
9.1593 4030 0.0648 - -
9.1821 4040 0.0622 - -
9.2049 4050 0.0635 - -
9.2276 4060 0.061 - -
9.2504 4070 0.0602 - -
9.2732 4080 0.0613 - -
9.2959 4090 0.0604 - -
9.3187 4100 0.0623 - -
9.3414 4110 0.0641 - -
9.3642 4120 0.0635 - -
9.3870 4130 0.0608 - -
9.4097 4140 0.0611 - -
9.4325 4150 0.0607 - -
9.4553 4160 0.0631 - -
9.4780 4170 0.0618 - -
9.5008 4180 0.0609 - -
9.5235 4190 0.0613 - -
9.5463 4200 0.0606 - -
9.5691 4210 0.0595 - -
9.5918 4220 0.0609 - -
9.6146 4230 0.061 - -
9.6374 4240 0.0616 - -
9.6601 4250 0.0613 0.0418 0.8282
9.6829 4260 0.0623 - -
9.7056 4270 0.0605 - -
9.7284 4280 0.0637 - -
9.7512 4290 0.0604 - -
9.7739 4300 0.0606 - -
9.7967 4310 0.0622 - -
9.8195 4320 0.0598 - -
9.8422 4330 0.0611 - -
9.8650 4340 0.0604 - -
9.8878 4350 0.0598 - -
9.9105 4360 0.0626 - -
9.9333 4370 0.0624 - -
9.9560 4380 0.0617 - -
9.9788 4390 0.0603 - -

Framework Versions

  • Python: 3.11.10
  • Sentence Transformers: 3.4.1
  • Transformers: 4.48.3
  • PyTorch: 2.5.1+cu124
  • Accelerate: 1.3.0
  • Datasets: 3.3.0
  • Tokenizers: 0.21.0

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}
Downloads last month
79
Safetensors
Model size
184M params
Tensor type
F32
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.

Model tree for CocoRoF/ModernBERT-SimCSE-multitask_v03-distill

Dataset used to train CocoRoF/ModernBERT-SimCSE-multitask_v03-distill

Evaluation results