tomaarsen HF staff commited on
Commit
986beef
·
verified ·
1 Parent(s): 3da67e1

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +435 -136
README.md CHANGED
@@ -1,137 +1,436 @@
1
- ---
2
- tags:
3
- - sentence-transformers
4
- - cross-encoder
5
- - text-classification
6
- pipeline_tag: text-classification
7
- library_name: sentence-transformers
8
- ---
9
-
10
- # CrossEncoder
11
-
12
- This is a [Cross Encoder](https://www.sbert.net/docs/cross_encoder/usage/usage.html) model trained using the [sentence-transformers](https://www.SBERT.net) library. It computes scores for pairs of texts, which can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
13
-
14
- ## Model Details
15
-
16
- ### Model Description
17
- - **Model Type:** Cross Encoder
18
- <!-- - **Base model:** [Unknown](https://huggingface.co/unknown) -->
19
- - **Maximum Sequence Length:** 8192 tokens
20
- - **Number of Output Labels:** 1 label
21
- <!-- - **Training Dataset:** Unknown -->
22
- <!-- - **Language:** Unknown -->
23
- <!-- - **License:** Unknown -->
24
-
25
- ### Model Sources
26
-
27
- - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
28
- - **Documentation:** [Cross Encoder Documentation](https://www.sbert.net/docs/cross_encoder/usage/usage.html)
29
- - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
30
- - **Hugging Face:** [Cross Encoders on Hugging Face](https://huggingface.co/models?library=sentence-transformers&other=cross-encoder)
31
-
32
- ## Usage
33
-
34
- ### Direct Usage (Sentence Transformers)
35
-
36
- First install the Sentence Transformers library:
37
-
38
- ```bash
39
- pip install -U sentence-transformers
40
- ```
41
-
42
- Then you can load this model and run inference.
43
- ```python
44
- from sentence_transformers import CrossEncoder
45
-
46
- # Download from the 🤗 Hub
47
- model = CrossEncoder("tomaarsen/reranker-ModernBERT-base-gooaq-bce")
48
- # Get scores for pairs of texts
49
- pairs = [
50
- ['How many calories in an egg', 'There are on average between 55 and 80 calories in an egg depending on its size.'],
51
- ['How many calories in an egg', 'Egg whites are very low in calories, have no fat, no cholesterol, and are loaded with protein.'],
52
- ['How many calories in an egg', 'Most of the calories in an egg come from the yellow yolk in the center.'],
53
- ]
54
- scores = model.predict(pairs)
55
- print(scores.shape)
56
- # (3,)
57
-
58
- # Or rank different texts based on similarity to a single text
59
- ranks = model.rank(
60
- 'How many calories in an egg',
61
- [
62
- 'There are on average between 55 and 80 calories in an egg depending on its size.',
63
- 'Egg whites are very low in calories, have no fat, no cholesterol, and are loaded with protein.',
64
- 'Most of the calories in an egg come from the yellow yolk in the center.',
65
- ]
66
- )
67
- # [{'corpus_id': ..., 'score': ...}, {'corpus_id': ..., 'score': ...}, ...]
68
- ```
69
-
70
- <!--
71
- ### Direct Usage (Transformers)
72
-
73
- <details><summary>Click to see the direct usage in Transformers</summary>
74
-
75
- </details>
76
- -->
77
-
78
- <!--
79
- ### Downstream Usage (Sentence Transformers)
80
-
81
- You can finetune this model on your own dataset.
82
-
83
- <details><summary>Click to expand</summary>
84
-
85
- </details>
86
- -->
87
-
88
- <!--
89
- ### Out-of-Scope Use
90
-
91
- *List how the model may foreseeably be misused and address what users ought not to do with the model.*
92
- -->
93
-
94
- <!--
95
- ## Bias, Risks and Limitations
96
-
97
- *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
98
- -->
99
-
100
- <!--
101
- ### Recommendations
102
-
103
- *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
104
- -->
105
-
106
- ## Training Details
107
-
108
- ### Framework Versions
109
- - Python: 3.11.6
110
- - Sentence Transformers: 3.5.0.dev0
111
- - Transformers: 4.48.3
112
- - PyTorch: 2.5.0+cu121
113
- - Accelerate: 1.3.0
114
- - Datasets: 2.20.0
115
- - Tokenizers: 0.21.0
116
-
117
- ## Citation
118
-
119
- ### BibTeX
120
-
121
- <!--
122
- ## Glossary
123
-
124
- *Clearly define terms in order to be accessible across audiences.*
125
- -->
126
-
127
- <!--
128
- ## Model Card Authors
129
-
130
- *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
131
- -->
132
-
133
- <!--
134
- ## Model Card Contact
135
-
136
- *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
137
  -->
 
1
+ ---
2
+ language:
3
+ - en
4
+ tags:
5
+ - sentence-transformers
6
+ - cross-encoder
7
+ - text-classification
8
+ - generated_from_trainer
9
+ - dataset_size:580740
10
+ - loss:BinaryCrossEntropyLoss
11
+ base_model: answerdotai/ModernBERT-base
12
+ datasets:
13
+ - sentence-transformers/gooaq
14
+ pipeline_tag: text-classification
15
+ library_name: sentence-transformers
16
+ metrics:
17
+ - map
18
+ - mrr@10
19
+ - ndcg@10
20
+ model-index:
21
+ - name: CrossEncoder based on answerdotai/ModernBERT-base
22
+ results: []
23
+ ---
24
+
25
+ # CrossEncoder based on answerdotai/ModernBERT-base
26
+
27
+ This is a [Cross Encoder](https://www.sbert.net/docs/cross_encoder/usage/usage.html) model finetuned from [answerdotai/ModernBERT-base](https://huggingface.co/answerdotai/ModernBERT-base) using the [sentence-transformers](https://www.SBERT.net) library. It computes scores for pairs of texts, which can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
28
+
29
+ ## Model Details
30
+
31
+ ### Model Description
32
+ - **Model Type:** Cross Encoder
33
+ - **Base model:** [answerdotai/ModernBERT-base](https://huggingface.co/answerdotai/ModernBERT-base) <!-- at revision 8949b909ec900327062f0ebf497f51aef5e6f0c8 -->
34
+ - **Maximum Sequence Length:** 8192 tokens
35
+ - **Number of Output Labels:** 1 label
36
+ <!-- - **Training Dataset:** Unknown -->
37
+ - **Language:** en
38
+ <!-- - **License:** Unknown -->
39
+
40
+ ### Model Sources
41
+
42
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
43
+ - **Documentation:** [Cross Encoder Documentation](https://www.sbert.net/docs/cross_encoder/usage/usage.html)
44
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
45
+ - **Hugging Face:** [Cross Encoders on Hugging Face](https://huggingface.co/models?library=sentence-transformers&other=cross-encoder)
46
+
47
+ ## Usage
48
+
49
+ ### Direct Usage (Sentence Transformers)
50
+
51
+ First install the Sentence Transformers library:
52
+
53
+ ```bash
54
+ pip install -U sentence-transformers
55
+ ```
56
+
57
+ Then you can load this model and run inference.
58
+ ```python
59
+ from sentence_transformers import CrossEncoder
60
+
61
+ # Download from the 🤗 Hub
62
+ model = CrossEncoder("sentence_transformers_model_id")
63
+ # Get scores for pairs of texts
64
+ pairs = [
65
+ ['should you take ibuprofen with high blood pressure?', "In general, people with high blood pressure should use acetaminophen or possibly aspirin for over-the-counter pain relief. Unless your health care provider has said it's OK, you should not use ibuprofen, ketoprofen, or naproxen sodium. If aspirin or acetaminophen doesn't help with your pain, call your doctor."],
66
+ ['how old do you have to be to work in sc?', 'The general minimum age of employment for South Carolina youth is 14, although the state allows younger children who are performers to work in show business. If their families are agricultural workers, children younger than age 14 may also participate in farm labor.'],
67
+ ['how to write a topic proposal for a research paper?', "['Write down the main topic of your paper. ... ', 'Write two or three short sentences under the main topic that explain why you chose that topic. ... ', 'Write a thesis sentence that states the angle and purpose of your research paper. ... ', 'List the items you will cover in the body of the paper that support your thesis statement.']"],
68
+ ['how much does aaf pay players?', 'These dates provided an opportunity for players cut at the NFL roster deadline, and each player signed a non-guaranteed three-year contract worth a total of $250,000 ($70,000 in 2019; $80,000 in 2020; $100,000 in 2021), with performance-based and fan-interaction incentives allowing for players to earn more.'],
69
+ ['is jove and zeus the same?', 'Jupiter, or Jove, in Roman mythology is the king of the gods and the god of sky and thunder, equivalent to Zeus in Greek traditions.'],
70
+ ]
71
+ scores = model.predict(pairs)
72
+ print(scores.shape)
73
+ # (5,)
74
+
75
+ # Or rank different texts based on similarity to a single text
76
+ ranks = model.rank(
77
+ 'should you take ibuprofen with high blood pressure?',
78
+ [
79
+ "In general, people with high blood pressure should use acetaminophen or possibly aspirin for over-the-counter pain relief. Unless your health care provider has said it's OK, you should not use ibuprofen, ketoprofen, or naproxen sodium. If aspirin or acetaminophen doesn't help with your pain, call your doctor.",
80
+ 'The general minimum age of employment for South Carolina youth is 14, although the state allows younger children who are performers to work in show business. If their families are agricultural workers, children younger than age 14 may also participate in farm labor.',
81
+ "['Write down the main topic of your paper. ... ', 'Write two or three short sentences under the main topic that explain why you chose that topic. ... ', 'Write a thesis sentence that states the angle and purpose of your research paper. ... ', 'List the items you will cover in the body of the paper that support your thesis statement.']",
82
+ 'These dates provided an opportunity for players cut at the NFL roster deadline, and each player signed a non-guaranteed three-year contract worth a total of $250,000 ($70,000 in 2019; $80,000 in 2020; $100,000 in 2021), with performance-based and fan-interaction incentives allowing for players to earn more.',
83
+ 'Jupiter, or Jove, in Roman mythology is the king of the gods and the god of sky and thunder, equivalent to Zeus in Greek traditions.',
84
+ ]
85
+ )
86
+ # [{'corpus_id': ..., 'score': ...}, {'corpus_id': ..., 'score': ...}, ...]
87
+ ```
88
+
89
+ <!--
90
+ ### Direct Usage (Transformers)
91
+
92
+ <details><summary>Click to see the direct usage in Transformers</summary>
93
+
94
+ </details>
95
+ -->
96
+
97
+ <!--
98
+ ### Downstream Usage (Sentence Transformers)
99
+
100
+ You can finetune this model on your own dataset.
101
+
102
+ <details><summary>Click to expand</summary>
103
+
104
+ </details>
105
+ -->
106
+
107
+ <!--
108
+ ### Out-of-Scope Use
109
+
110
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
111
+ -->
112
+
113
+ ## Evaluation
114
+
115
+ ### Metrics
116
+
117
+ #### Cross Encoder Reranking
118
+
119
+ * Datasets: `gooaq-dev`, `NanoMSMARCO`, `NanoNFCorpus` and `NanoNQ`
120
+ * Evaluated with [<code>CERerankingEvaluator</code>](https://sbert.net/docs/package_reference/cross_encoder/evaluation.html#sentence_transformers.cross_encoder.evaluation.CERerankingEvaluator)
121
+
122
+ | Metric | gooaq-dev | NanoMSMARCO | NanoNFCorpus | NanoNQ |
123
+ |:------------|:---------------------|:---------------------|:---------------------|:---------------------|
124
+ | map | 0.7386 (+0.0063) | 0.5463 (+0.0567) | 0.3300 (+0.0595) | 0.6707 (+0.2500) |
125
+ | mrr@10 | 0.7360 (+0.0068) | 0.5401 (+0.0626) | 0.5409 (+0.0410) | 0.6737 (+0.2471) |
126
+ | **ndcg@10** | **0.7880 (+0.0064)** | **0.6203 (+0.0799)** | **0.3660 (+0.0410)** | **0.7246 (+0.2240)** |
127
+
128
+ #### Cross Encoder Nano BEIR
129
+
130
+ * Dataset: `NanoBEIR_mean`
131
+ * Evaluated with [<code>CENanoBEIREvaluator</code>](https://sbert.net/docs/package_reference/cross_encoder/evaluation.html#sentence_transformers.cross_encoder.evaluation.CENanoBEIREvaluator)
132
+
133
+ | Metric | Value |
134
+ |:------------|:---------------------|
135
+ | map | 0.5157 (+0.1221) |
136
+ | mrr@10 | 0.5849 (+0.1169) |
137
+ | **ndcg@10** | **0.5703 (+0.1149)** |
138
+
139
+ <!--
140
+ ## Bias, Risks and Limitations
141
+
142
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
143
+ -->
144
+
145
+ <!--
146
+ ### Recommendations
147
+
148
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
149
+ -->
150
+
151
+ ## Training Details
152
+
153
+ ### Training Dataset
154
+
155
+ #### Unnamed Dataset
156
+
157
+ * Size: 580,740 training samples
158
+ * Columns: <code>query</code>, <code>response</code>, and <code>label</code>
159
+ * Approximate statistics based on the first 1000 samples:
160
+ | | query | response | label |
161
+ |:--------|:----------------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------------------|:-----------------------------|
162
+ | type | string | string | int |
163
+ | details | <ul><li>min: 17 characters</li><li>mean: 42.5 characters</li><li>max: 91 characters</li></ul> | <ul><li>min: 51 characters</li><li>mean: 253.83 characters</li><li>max: 385 characters</li></ul> | <ul><li>1: 100.00%</li></ul> |
164
+ * Samples:
165
+ | query | response | label |
166
+ |:----------------------------------------------------------------------------------|:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:---------------|
167
+ | <code>what is the difference between a certificate and associate's degree?</code> | <code>Certificate degrees are extremely focused in their objective(s) and are related to a specific job or career niche. ... Certificates are often obtained as an add-on to an associate degree. Associate degree programs require two years of full-time classroom attendance in order to complete a degree.</code> | <code>1</code> |
168
+ | <code>what is the difference between 5star and inverter ac?</code> | <code>An inverter AC works on variable speed compressor whereas a 5-star rated non-inverter AC have single speed compressor. It changes its speed as per the heat load and number of people. The need of Stabilizer: A stabilizer is installed with the AC to maintain an optimum voltage range during the power fluctuations.</code> | <code>1</code> |
169
+ | <code>what is the difference between gas and electric cars?</code> | <code>A gas-powered car has a fuel tank, which supplies gasoline to the engine. The engine then turns a transmission, which turns the wheels. Move your mouse over the parts for a 3-D view. An electric car, on the other hand, has a set of batteries that provides electricity to an electric motor.</code> | <code>1</code> |
170
+ * Loss: [<code>BinaryCrossEntropyLoss</code>](https://sbert.net/docs/package_reference/cross_encoder/losses.html#binarycrossentropyloss) with these parameters:
171
+ ```json
172
+ {
173
+ "activation_fct": "torch.nn.modules.linear.Identity",
174
+ "pos_weight": 5
175
+ }
176
+ ```
177
+
178
+ ### Evaluation Dataset
179
+
180
+ #### gooaq
181
+
182
+ * Dataset: [gooaq](https://huggingface.co/datasets/sentence-transformers/gooaq) at [b089f72](https://huggingface.co/datasets/sentence-transformers/gooaq/tree/b089f728748a068b7bc5234e5bcf5b25e3c8279c)
183
+ * Size: 3,012,496 evaluation samples
184
+ * Columns: <code>query</code>, <code>response</code>, and <code>label</code>
185
+ * Approximate statistics based on the first 1000 samples:
186
+ | | query | response | label |
187
+ |:--------|:-----------------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------------------|:-----------------------------|
188
+ | type | string | string | int |
189
+ | details | <ul><li>min: 18 characters</li><li>mean: 43.05 characters</li><li>max: 88 characters</li></ul> | <ul><li>min: 51 characters</li><li>mean: 252.39 characters</li><li>max: 386 characters</li></ul> | <ul><li>1: 100.00%</li></ul> |
190
+ * Samples:
191
+ | query | response | label |
192
+ |:-----------------------------------------------------------------|:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:---------------|
193
+ | <code>should you take ibuprofen with high blood pressure?</code> | <code>In general, people with high blood pressure should use acetaminophen or possibly aspirin for over-the-counter pain relief. Unless your health care provider has said it's OK, you should not use ibuprofen, ketoprofen, or naproxen sodium. If aspirin or acetaminophen doesn't help with your pain, call your doctor.</code> | <code>1</code> |
194
+ | <code>how old do you have to be to work in sc?</code> | <code>The general minimum age of employment for South Carolina youth is 14, although the state allows younger children who are performers to work in show business. If their families are agricultural workers, children younger than age 14 may also participate in farm labor.</code> | <code>1</code> |
195
+ | <code>how to write a topic proposal for a research paper?</code> | <code>['Write down the main topic of your paper. ... ', 'Write two or three short sentences under the main topic that explain why you chose that topic. ... ', 'Write a thesis sentence that states the angle and purpose of your research paper. ... ', 'List the items you will cover in the body of the paper that support your thesis statement.']</code> | <code>1</code> |
196
+ * Loss: [<code>BinaryCrossEntropyLoss</code>](https://sbert.net/docs/package_reference/cross_encoder/losses.html#binarycrossentropyloss) with these parameters:
197
+ ```json
198
+ {
199
+ "activation_fct": "torch.nn.modules.linear.Identity",
200
+ "pos_weight": 5
201
+ }
202
+ ```
203
+
204
+ ### Training Hyperparameters
205
+ #### Non-Default Hyperparameters
206
+
207
+ - `eval_strategy`: steps
208
+ - `per_device_train_batch_size`: 64
209
+ - `per_device_eval_batch_size`: 64
210
+ - `learning_rate`: 2e-05
211
+ - `num_train_epochs`: 1
212
+ - `warmup_ratio`: 0.1
213
+ - `seed`: 12
214
+ - `bf16`: True
215
+ - `dataloader_num_workers`: 4
216
+ - `load_best_model_at_end`: True
217
+ - `batch_sampler`: no_duplicates
218
+
219
+ #### All Hyperparameters
220
+ <details><summary>Click to expand</summary>
221
+
222
+ - `overwrite_output_dir`: False
223
+ - `do_predict`: False
224
+ - `eval_strategy`: steps
225
+ - `prediction_loss_only`: True
226
+ - `per_device_train_batch_size`: 64
227
+ - `per_device_eval_batch_size`: 64
228
+ - `per_gpu_train_batch_size`: None
229
+ - `per_gpu_eval_batch_size`: None
230
+ - `gradient_accumulation_steps`: 1
231
+ - `eval_accumulation_steps`: None
232
+ - `torch_empty_cache_steps`: None
233
+ - `learning_rate`: 2e-05
234
+ - `weight_decay`: 0.0
235
+ - `adam_beta1`: 0.9
236
+ - `adam_beta2`: 0.999
237
+ - `adam_epsilon`: 1e-08
238
+ - `max_grad_norm`: 1.0
239
+ - `num_train_epochs`: 1
240
+ - `max_steps`: -1
241
+ - `lr_scheduler_type`: linear
242
+ - `lr_scheduler_kwargs`: {}
243
+ - `warmup_ratio`: 0.1
244
+ - `warmup_steps`: 0
245
+ - `log_level`: passive
246
+ - `log_level_replica`: warning
247
+ - `log_on_each_node`: True
248
+ - `logging_nan_inf_filter`: True
249
+ - `save_safetensors`: True
250
+ - `save_on_each_node`: False
251
+ - `save_only_model`: False
252
+ - `restore_callback_states_from_checkpoint`: False
253
+ - `no_cuda`: False
254
+ - `use_cpu`: False
255
+ - `use_mps_device`: False
256
+ - `seed`: 12
257
+ - `data_seed`: None
258
+ - `jit_mode_eval`: False
259
+ - `use_ipex`: False
260
+ - `bf16`: True
261
+ - `fp16`: False
262
+ - `fp16_opt_level`: O1
263
+ - `half_precision_backend`: auto
264
+ - `bf16_full_eval`: False
265
+ - `fp16_full_eval`: False
266
+ - `tf32`: None
267
+ - `local_rank`: 0
268
+ - `ddp_backend`: None
269
+ - `tpu_num_cores`: None
270
+ - `tpu_metrics_debug`: False
271
+ - `debug`: []
272
+ - `dataloader_drop_last`: False
273
+ - `dataloader_num_workers`: 4
274
+ - `dataloader_prefetch_factor`: None
275
+ - `past_index`: -1
276
+ - `disable_tqdm`: False
277
+ - `remove_unused_columns`: True
278
+ - `label_names`: None
279
+ - `load_best_model_at_end`: True
280
+ - `ignore_data_skip`: False
281
+ - `fsdp`: []
282
+ - `fsdp_min_num_params`: 0
283
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
284
+ - `fsdp_transformer_layer_cls_to_wrap`: None
285
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
286
+ - `deepspeed`: None
287
+ - `label_smoothing_factor`: 0.0
288
+ - `optim`: adamw_torch
289
+ - `optim_args`: None
290
+ - `adafactor`: False
291
+ - `group_by_length`: False
292
+ - `length_column_name`: length
293
+ - `ddp_find_unused_parameters`: None
294
+ - `ddp_bucket_cap_mb`: None
295
+ - `ddp_broadcast_buffers`: False
296
+ - `dataloader_pin_memory`: True
297
+ - `dataloader_persistent_workers`: False
298
+ - `skip_memory_metrics`: True
299
+ - `use_legacy_prediction_loop`: False
300
+ - `push_to_hub`: False
301
+ - `resume_from_checkpoint`: None
302
+ - `hub_model_id`: None
303
+ - `hub_strategy`: every_save
304
+ - `hub_private_repo`: None
305
+ - `hub_always_push`: False
306
+ - `gradient_checkpointing`: False
307
+ - `gradient_checkpointing_kwargs`: None
308
+ - `include_inputs_for_metrics`: False
309
+ - `include_for_metrics`: []
310
+ - `eval_do_concat_batches`: True
311
+ - `fp16_backend`: auto
312
+ - `push_to_hub_model_id`: None
313
+ - `push_to_hub_organization`: None
314
+ - `mp_parameters`:
315
+ - `auto_find_batch_size`: False
316
+ - `full_determinism`: False
317
+ - `torchdynamo`: None
318
+ - `ray_scope`: last
319
+ - `ddp_timeout`: 1800
320
+ - `torch_compile`: False
321
+ - `torch_compile_backend`: None
322
+ - `torch_compile_mode`: None
323
+ - `dispatch_batches`: None
324
+ - `split_batches`: None
325
+ - `include_tokens_per_second`: False
326
+ - `include_num_input_tokens_seen`: False
327
+ - `neftune_noise_alpha`: None
328
+ - `optim_target_modules`: None
329
+ - `batch_eval_metrics`: False
330
+ - `eval_on_start`: False
331
+ - `use_liger_kernel`: False
332
+ - `eval_use_gather_object`: False
333
+ - `average_tokens_across_devices`: False
334
+ - `prompts`: None
335
+ - `batch_sampler`: no_duplicates
336
+ - `multi_dataset_batch_sampler`: proportional
337
+
338
+ </details>
339
+
340
+ ### Training Logs
341
+ | Epoch | Step | Training Loss | Validation Loss | gooaq-dev_ndcg@10 | NanoMSMARCO_ndcg@10 | NanoNFCorpus_ndcg@10 | NanoNQ_ndcg@10 | NanoBEIR_mean_ndcg@10 |
342
+ |:----------:|:--------:|:-------------:|:---------------:|:--------------------:|:--------------------:|:--------------------:|:--------------------:|:---------------------:|
343
+ | -1 | -1 | - | - | 0.1879 (-0.5937) | 0.0748 (-0.4656) | 0.2012 (-0.1238) | 0.0414 (-0.4592) | 0.1058 (-0.3496) |
344
+ | 0.0001 | 1 | 1.1971 | - | - | - | - | - | - |
345
+ | 0.0220 | 200 | 1.1557 | - | - | - | - | - | - |
346
+ | 0.0441 | 400 | 0.9119 | - | - | - | - | - | - |
347
+ | 0.0661 | 600 | 0.5124 | - | - | - | - | - | - |
348
+ | 0.0882 | 800 | 0.4225 | - | - | - | - | - | - |
349
+ | 0.1102 | 1000 | 0.3876 | 1.3811 | 0.7192 (-0.0624) | 0.5171 (-0.0233) | 0.3438 (+0.0187) | 0.5647 (+0.0641) | 0.4752 (+0.0198) |
350
+ | 0.1322 | 1200 | 0.3563 | - | - | - | - | - | - |
351
+ | 0.1543 | 1400 | 0.3155 | - | - | - | - | - | - |
352
+ | 0.1763 | 1600 | 0.3181 | - | - | - | - | - | - |
353
+ | 0.1983 | 1800 | 0.289 | - | - | - | - | - | - |
354
+ | 0.2204 | 2000 | 0.283 | 0.6710 | 0.7528 (-0.0289) | 0.5559 (+0.0155) | 0.3445 (+0.0194) | 0.6592 (+0.1585) | 0.5198 (+0.0645) |
355
+ | 0.2424 | 2200 | 0.2745 | - | - | - | - | - | - |
356
+ | 0.2645 | 2400 | 0.2575 | - | - | - | - | - | - |
357
+ | 0.2865 | 2600 | 0.2762 | - | - | - | - | - | - |
358
+ | 0.3085 | 2800 | 0.2489 | - | - | - | - | - | - |
359
+ | 0.3306 | 3000 | 0.2259 | 0.7575 | 0.7696 (-0.0121) | 0.4982 (-0.0422) | 0.3555 (+0.0305) | 0.6483 (+0.1476) | 0.5007 (+0.0453) |
360
+ | 0.3526 | 3200 | 0.2576 | - | - | - | - | - | - |
361
+ | 0.3747 | 3400 | 0.2384 | - | - | - | - | - | - |
362
+ | 0.3967 | 3600 | 0.2431 | - | - | - | - | - | - |
363
+ | 0.4187 | 3800 | 0.206 | - | - | - | - | - | - |
364
+ | 0.4408 | 4000 | 0.2381 | 0.9594 | 0.7774 (-0.0042) | 0.5649 (+0.0245) | 0.3666 (+0.0416) | 0.6842 (+0.1836) | 0.5386 (+0.0832) |
365
+ | 0.4628 | 4200 | 0.2196 | - | - | - | - | - | - |
366
+ | 0.4848 | 4400 | 0.2153 | - | - | - | - | - | - |
367
+ | 0.5069 | 4600 | 0.217 | - | - | - | - | - | - |
368
+ | 0.5289 | 4800 | 0.1982 | - | - | - | - | - | - |
369
+ | 0.5510 | 5000 | 0.2172 | 0.6249 | 0.7864 (+0.0047) | 0.6029 (+0.0625) | 0.3833 (+0.0583) | 0.7029 (+0.2022) | 0.5630 (+0.1077) |
370
+ | 0.5730 | 5200 | 0.2145 | - | - | - | - | - | - |
371
+ | 0.5950 | 5400 | 0.213 | - | - | - | - | - | - |
372
+ | 0.6171 | 5600 | 0.2117 | - | - | - | - | - | - |
373
+ | 0.6391 | 5800 | 0.2102 | - | - | - | - | - | - |
374
+ | 0.6612 | 6000 | 0.2125 | 0.7420 | 0.7834 (+0.0017) | 0.5907 (+0.0503) | 0.3771 (+0.0521) | 0.7176 (+0.2169) | 0.5618 (+0.1064) |
375
+ | 0.6832 | 6200 | 0.1995 | - | - | - | - | - | - |
376
+ | 0.7052 | 6400 | 0.1978 | - | - | - | - | - | - |
377
+ | 0.7273 | 6600 | 0.1857 | - | - | - | - | - | - |
378
+ | 0.7493 | 6800 | 0.1811 | - | - | - | - | - | - |
379
+ | 0.7713 | 7000 | 0.2055 | 1.1528 | 0.7827 (+0.0011) | 0.6152 (+0.0748) | 0.3730 (+0.0480) | 0.7190 (+0.2184) | 0.5691 (+0.1137) |
380
+ | 0.7934 | 7200 | 0.1855 | - | - | - | - | - | - |
381
+ | 0.8154 | 7400 | 0.1829 | - | - | - | - | - | - |
382
+ | 0.8375 | 7600 | 0.1901 | - | - | - | - | - | - |
383
+ | 0.8595 | 7800 | 0.1862 | - | - | - | - | - | - |
384
+ | **0.8815** | **8000** | **0.1858** | **0.6424** | **0.7880 (+0.0064)** | **0.6203 (+0.0799)** | **0.3660 (+0.0410)** | **0.7246 (+0.2240)** | **0.5703 (+0.1149)** |
385
+ | 0.9036 | 8200 | 0.1545 | - | - | - | - | - | - |
386
+ | 0.9256 | 8400 | 0.1729 | - | - | - | - | - | - |
387
+ | 0.9477 | 8600 | 0.1657 | - | - | - | - | - | - |
388
+ | 0.9697 | 8800 | 0.1698 | - | - | - | - | - | - |
389
+ | 0.9917 | 9000 | 0.1658 | 0.6904 | 0.7898 (+0.0081) | 0.6011 (+0.0606) | 0.3612 (+0.0361) | 0.7165 (+0.2159) | 0.5596 (+0.1042) |
390
+ | -1 | -1 | - | - | 0.7880 (+0.0064) | 0.6203 (+0.0799) | 0.3660 (+0.0410) | 0.7246 (+0.2240) | 0.5703 (+0.1149) |
391
+
392
+ * The bold row denotes the saved checkpoint.
393
+
394
+ ### Framework Versions
395
+ - Python: 3.11.10
396
+ - Sentence Transformers: 3.5.0.dev0
397
+ - Transformers: 4.49.0.dev0
398
+ - PyTorch: 2.6.0.dev20241112+cu121
399
+ - Accelerate: 1.2.0
400
+ - Datasets: 3.2.0
401
+ - Tokenizers: 0.21.0
402
+
403
+ ## Citation
404
+
405
+ ### BibTeX
406
+
407
+ #### Sentence Transformers
408
+ ```bibtex
409
+ @inproceedings{reimers-2019-sentence-bert,
410
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
411
+ author = "Reimers, Nils and Gurevych, Iryna",
412
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
413
+ month = "11",
414
+ year = "2019",
415
+ publisher = "Association for Computational Linguistics",
416
+ url = "https://arxiv.org/abs/1908.10084",
417
+ }
418
+ ```
419
+
420
+ <!--
421
+ ## Glossary
422
+
423
+ *Clearly define terms in order to be accessible across audiences.*
424
+ -->
425
+
426
+ <!--
427
+ ## Model Card Authors
428
+
429
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
430
+ -->
431
+
432
+ <!--
433
+ ## Model Card Contact
434
+
435
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
436
  -->