rahulseetharaman commited on
Commit
ab1e882
·
verified ·
1 Parent(s): 3b1b232

Add new CrossEncoder model

Browse files
README.md ADDED
@@ -0,0 +1,505 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ tags:
5
+ - sentence-transformers
6
+ - cross-encoder
7
+ - reranker
8
+ - generated_from_trainer
9
+ - dataset_size:78704
10
+ - loss:ListNetLoss
11
+ base_model: bansalaman18/bert-uncased_L-10_H-256_A-4
12
+ datasets:
13
+ - microsoft/ms_marco
14
+ pipeline_tag: text-ranking
15
+ library_name: sentence-transformers
16
+ metrics:
17
+ - map
18
+ - mrr@10
19
+ - ndcg@10
20
+ model-index:
21
+ - name: CrossEncoder based on bansalaman18/bert-uncased_L-10_H-256_A-4
22
+ results:
23
+ - task:
24
+ type: cross-encoder-reranking
25
+ name: Cross Encoder Reranking
26
+ dataset:
27
+ name: NanoMSMARCO R100
28
+ type: NanoMSMARCO_R100
29
+ metrics:
30
+ - type: map
31
+ value: 0.0654
32
+ name: Map
33
+ - type: mrr@10
34
+ value: 0.039
35
+ name: Mrr@10
36
+ - type: ndcg@10
37
+ value: 0.0574
38
+ name: Ndcg@10
39
+ - task:
40
+ type: cross-encoder-reranking
41
+ name: Cross Encoder Reranking
42
+ dataset:
43
+ name: NanoNFCorpus R100
44
+ type: NanoNFCorpus_R100
45
+ metrics:
46
+ - type: map
47
+ value: 0.2752
48
+ name: Map
49
+ - type: mrr@10
50
+ value: 0.3973
51
+ name: Mrr@10
52
+ - type: ndcg@10
53
+ value: 0.2485
54
+ name: Ndcg@10
55
+ - task:
56
+ type: cross-encoder-reranking
57
+ name: Cross Encoder Reranking
58
+ dataset:
59
+ name: NanoNQ R100
60
+ type: NanoNQ_R100
61
+ metrics:
62
+ - type: map
63
+ value: 0.0653
64
+ name: Map
65
+ - type: mrr@10
66
+ value: 0.0417
67
+ name: Mrr@10
68
+ - type: ndcg@10
69
+ value: 0.0648
70
+ name: Ndcg@10
71
+ - task:
72
+ type: cross-encoder-nano-beir
73
+ name: Cross Encoder Nano BEIR
74
+ dataset:
75
+ name: NanoBEIR R100 mean
76
+ type: NanoBEIR_R100_mean
77
+ metrics:
78
+ - type: map
79
+ value: 0.1353
80
+ name: Map
81
+ - type: mrr@10
82
+ value: 0.1593
83
+ name: Mrr@10
84
+ - type: ndcg@10
85
+ value: 0.1235
86
+ name: Ndcg@10
87
+ ---
88
+
89
+ # CrossEncoder based on bansalaman18/bert-uncased_L-10_H-256_A-4
90
+
91
+ This is a [Cross Encoder](https://www.sbert.net/docs/cross_encoder/usage/usage.html) model finetuned from [bansalaman18/bert-uncased_L-10_H-256_A-4](https://huggingface.co/bansalaman18/bert-uncased_L-10_H-256_A-4) on the [ms_marco](https://huggingface.co/datasets/microsoft/ms_marco) dataset using the [sentence-transformers](https://www.SBERT.net) library. It computes scores for pairs of texts, which can be used for text reranking and semantic search.
92
+
93
+ ## Model Details
94
+
95
+ ### Model Description
96
+ - **Model Type:** Cross Encoder
97
+ - **Base model:** [bansalaman18/bert-uncased_L-10_H-256_A-4](https://huggingface.co/bansalaman18/bert-uncased_L-10_H-256_A-4) <!-- at revision 2c743a1678c7e2a9a2ba9cda4400b08cfa7054fc -->
98
+ - **Maximum Sequence Length:** 512 tokens
99
+ - **Number of Output Labels:** 1 label
100
+ - **Training Dataset:**
101
+ - [ms_marco](https://huggingface.co/datasets/microsoft/ms_marco)
102
+ - **Language:** en
103
+ <!-- - **License:** Unknown -->
104
+
105
+ ### Model Sources
106
+
107
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
108
+ - **Documentation:** [Cross Encoder Documentation](https://www.sbert.net/docs/cross_encoder/usage/usage.html)
109
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
110
+ - **Hugging Face:** [Cross Encoders on Hugging Face](https://huggingface.co/models?library=sentence-transformers&other=cross-encoder)
111
+
112
+ ## Usage
113
+
114
+ ### Direct Usage (Sentence Transformers)
115
+
116
+ First install the Sentence Transformers library:
117
+
118
+ ```bash
119
+ pip install -U sentence-transformers
120
+ ```
121
+
122
+ Then you can load this model and run inference.
123
+ ```python
124
+ from sentence_transformers import CrossEncoder
125
+
126
+ # Download from the 🤗 Hub
127
+ model = CrossEncoder("rahulseetharaman/reranker-msmarco-v1.1-bert-uncased_L-10_H-256_A-4-listnet")
128
+ # Get scores for pairs of texts
129
+ pairs = [
130
+ ['what does sertraline treat', 'Sertraline is used to treat depression, obsessive compulsive disorder (OCD), panic disorder, premenstrual dysphoric disorder (PMDD), posttraumatic stress disorder (PTSD), and social anxiety disorder (SAD). Sertraline belongs to a group of medicines known as selective serotonin reuptake inhibitors (SSRIs). '],
131
+ ['what does sertraline treat', 'Sertraline is used for a number of conditions including: major depression, obsessive-compulsive disorder (OCD), body dysmorphic disorder (BDD), posttraumatic stress disorder (PTSD), premenstrual dysphoric disorder (PMDD), panic disorder and social phobia (social anxiety disorder). It was introduced to the market by Pfizer in 1991. Sertraline is primarily prescribed for major depressive disorder in adult outpatients as well as obsessive-compulsive disorder, panic disorder, and social anxiety disorder, in both adults and children.'],
132
+ ['what does sertraline treat', 'Zoloft is the brand name of sertraline, an antidepressant used to treat major depressive disorders. Zoloft is in a class of antidepressants known as selective serotonin reuptake inhibitors (SSRIs). They work by controlling levels of serotonin (a neurotransmitter) in the brain. A: Zoloft (sertraline) is a medication that is used to treat depression or anxiety. This medication is in the family of drugs called SSRIs and works by bringing a balance to serotonin in the brain that is causing your condition.'],
133
+ ['what does sertraline treat', 'A: Zoloft (sertraline) is a type of antidepressant known as a selective serotonin reuptake inhibitor (SSRI). It is commonly used to treat depression, social anxiety disorder, posttraumatic stress disorder (PTSD), panic disorder and obsessive-compulsive disorder (OCD). A: Zoloft (sertraline) is a medication that is used to treat depression or anxiety. This medication is in the family of drugs called SSRIs and works by bringing a balance to serotonin in the brain that is causing your condition.'],
134
+ ['what does sertraline treat', 'A Stacy Wiegman, PharmD, Pharmacy, answered. Sertraline is an antidepressant that treats the symptoms of different psychological disorders, including obsessive-compulsive disorder (OCD), by increasing serotonin and balancing chemicals in the brain. Sertraline is classified as a selective serotonin reuptake inhibitor (SSRI). '],
135
+ ]
136
+ scores = model.predict(pairs)
137
+ print(scores.shape)
138
+ # (5,)
139
+
140
+ # Or rank different texts based on similarity to a single text
141
+ ranks = model.rank(
142
+ 'what does sertraline treat',
143
+ [
144
+ 'Sertraline is used to treat depression, obsessive compulsive disorder (OCD), panic disorder, premenstrual dysphoric disorder (PMDD), posttraumatic stress disorder (PTSD), and social anxiety disorder (SAD). Sertraline belongs to a group of medicines known as selective serotonin reuptake inhibitors (SSRIs). ',
145
+ 'Sertraline is used for a number of conditions including: major depression, obsessive-compulsive disorder (OCD), body dysmorphic disorder (BDD), posttraumatic stress disorder (PTSD), premenstrual dysphoric disorder (PMDD), panic disorder and social phobia (social anxiety disorder). It was introduced to the market by Pfizer in 1991. Sertraline is primarily prescribed for major depressive disorder in adult outpatients as well as obsessive-compulsive disorder, panic disorder, and social anxiety disorder, in both adults and children.',
146
+ 'Zoloft is the brand name of sertraline, an antidepressant used to treat major depressive disorders. Zoloft is in a class of antidepressants known as selective serotonin reuptake inhibitors (SSRIs). They work by controlling levels of serotonin (a neurotransmitter) in the brain. A: Zoloft (sertraline) is a medication that is used to treat depression or anxiety. This medication is in the family of drugs called SSRIs and works by bringing a balance to serotonin in the brain that is causing your condition.',
147
+ 'A: Zoloft (sertraline) is a type of antidepressant known as a selective serotonin reuptake inhibitor (SSRI). It is commonly used to treat depression, social anxiety disorder, posttraumatic stress disorder (PTSD), panic disorder and obsessive-compulsive disorder (OCD). A: Zoloft (sertraline) is a medication that is used to treat depression or anxiety. This medication is in the family of drugs called SSRIs and works by bringing a balance to serotonin in the brain that is causing your condition.',
148
+ 'A Stacy Wiegman, PharmD, Pharmacy, answered. Sertraline is an antidepressant that treats the symptoms of different psychological disorders, including obsessive-compulsive disorder (OCD), by increasing serotonin and balancing chemicals in the brain. Sertraline is classified as a selective serotonin reuptake inhibitor (SSRI). ',
149
+ ]
150
+ )
151
+ # [{'corpus_id': ..., 'score': ...}, {'corpus_id': ..., 'score': ...}, ...]
152
+ ```
153
+
154
+ <!--
155
+ ### Direct Usage (Transformers)
156
+
157
+ <details><summary>Click to see the direct usage in Transformers</summary>
158
+
159
+ </details>
160
+ -->
161
+
162
+ <!--
163
+ ### Downstream Usage (Sentence Transformers)
164
+
165
+ You can finetune this model on your own dataset.
166
+
167
+ <details><summary>Click to expand</summary>
168
+
169
+ </details>
170
+ -->
171
+
172
+ <!--
173
+ ### Out-of-Scope Use
174
+
175
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
176
+ -->
177
+
178
+ ## Evaluation
179
+
180
+ ### Metrics
181
+
182
+ #### Cross Encoder Reranking
183
+
184
+ * Datasets: `NanoMSMARCO_R100`, `NanoNFCorpus_R100` and `NanoNQ_R100`
185
+ * Evaluated with [<code>CrossEncoderRerankingEvaluator</code>](https://sbert.net/docs/package_reference/cross_encoder/evaluation.html#sentence_transformers.cross_encoder.evaluation.CrossEncoderRerankingEvaluator) with these parameters:
186
+ ```json
187
+ {
188
+ "at_k": 10,
189
+ "always_rerank_positives": true
190
+ }
191
+ ```
192
+
193
+ | Metric | NanoMSMARCO_R100 | NanoNFCorpus_R100 | NanoNQ_R100 |
194
+ |:------------|:---------------------|:---------------------|:---------------------|
195
+ | map | 0.0654 (-0.4242) | 0.2752 (+0.0142) | 0.0653 (-0.3543) |
196
+ | mrr@10 | 0.0390 (-0.4385) | 0.3973 (-0.1026) | 0.0417 (-0.3850) |
197
+ | **ndcg@10** | **0.0574 (-0.4831)** | **0.2485 (-0.0765)** | **0.0648 (-0.4359)** |
198
+
199
+ #### Cross Encoder Nano BEIR
200
+
201
+ * Dataset: `NanoBEIR_R100_mean`
202
+ * Evaluated with [<code>CrossEncoderNanoBEIREvaluator</code>](https://sbert.net/docs/package_reference/cross_encoder/evaluation.html#sentence_transformers.cross_encoder.evaluation.CrossEncoderNanoBEIREvaluator) with these parameters:
203
+ ```json
204
+ {
205
+ "dataset_names": [
206
+ "msmarco",
207
+ "nfcorpus",
208
+ "nq"
209
+ ],
210
+ "rerank_k": 100,
211
+ "at_k": 10,
212
+ "always_rerank_positives": true
213
+ }
214
+ ```
215
+
216
+ | Metric | Value |
217
+ |:------------|:---------------------|
218
+ | map | 0.1353 (-0.2548) |
219
+ | mrr@10 | 0.1593 (-0.3087) |
220
+ | **ndcg@10** | **0.1235 (-0.3318)** |
221
+
222
+ <!--
223
+ ## Bias, Risks and Limitations
224
+
225
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
226
+ -->
227
+
228
+ <!--
229
+ ### Recommendations
230
+
231
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
232
+ -->
233
+
234
+ ## Training Details
235
+
236
+ ### Training Dataset
237
+
238
+ #### ms_marco
239
+
240
+ * Dataset: [ms_marco](https://huggingface.co/datasets/microsoft/ms_marco) at [a47ee7a](https://huggingface.co/datasets/microsoft/ms_marco/tree/a47ee7aae8d7d466ba15f9f0bfac3b3681087b3a)
241
+ * Size: 78,704 training samples
242
+ * Columns: <code>query</code>, <code>docs</code>, and <code>labels</code>
243
+ * Approximate statistics based on the first 1000 samples:
244
+ | | query | docs | labels |
245
+ |:--------|:------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------|
246
+ | type | string | list | list |
247
+ | details | <ul><li>min: 10 characters</li><li>mean: 34.03 characters</li><li>max: 109 characters</li></ul> | <ul><li>min: 1 elements</li><li>mean: 5.89 elements</li><li>max: 10 elements</li></ul> | <ul><li>min: 1 elements</li><li>mean: 5.89 elements</li><li>max: 10 elements</li></ul> |
248
+ * Samples:
249
+ | query | docs | labels |
250
+ |:------------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:----------------------------------|
251
+ | <code>what is vascular tissue in a plant</code> | <code>['Vascular tissue. vascular tissue. (botany) plant tissue that transports nutrients and water throughout a plant; as veins and arteries are to animals. The two kind of vascular tissue are xylem (used mainly for water) and phloem (used more for nutrients). ', 'The primary components of vascular tissue are the xylem and phloem. These two tissues transport fluid and nutrients internally. There are also two meristems associated with vascular tissue: the vascular cambium and the cork cambium. All the vascular tissues within a particular plant together constitute the vascular tissue system of that plant. The cells in vascular tissue are typically long and slender. Since the xylem and phloem function in the conduction of water, minerals, and nutrients throughout the plant, it is not surprising that their form should be similar to pipes. The vascular tissue in plants is arranged in long, discrete strands called vascular bundles. These bundles include both xylem and phloem, as well as supportin...</code> | <code>[1, 1, 0, 0, 0, ...]</code> |
252
+ | <code>what is lithotripsy</code> | <code>['Lithotripsy is the use of high-energy shock waves to fragment and disintegrate kidney stones. The shock wave, created by using a high-voltage spark or an electromagnetic impulse outside of the body, is focused on the stone. The shock wave shatters the stone, allowing the fragments to pass through the urinary system. Description. Lithotripsy uses the technique of focused shock waves to fragment a stone in the kidney or the ureter. The affected person is placed in a tub of water or in contact with a water-filled cushion.', 'Overview. Lithotripsy is a medical procedure used to treat kidney stones. It may also be used to treat stones in other organs, such as the gall bladder or the liver. Kidney stones are collections of solid minerals that sometimes form in the kidneys. Healthy kidneys do not have these stone-like formations', 'Lithotripsy is a procedure that uses shock waves to break up stones in the kidney, bladder, or ureter (tube that carries urine from your kidneys to your bladder)...</code> | <code>[1, 1, 0, 0, 0, ...]</code> |
253
+ | <code>what is an eye</code> | <code>['A compound eye may consist of thousands of individual photoreceptor units or ommatidia (ommatidium, singular). The image perceived is a combination of inputs from the numerous ommatidia (individual eye units), which are located on a convex surface, thus pointing in slightly different directions. The eye of a red-tailed hawk. Visual acuity, or resolving power, is the ability to distinguish fine detail and is the property of cone cells. It is often measured in cycles per degree (CPD), which measures an angular resolution, or how much an eye can differentiate one object from another in terms of visual angles.', '1 The iris is the colored part of the eye (most often blue or brown). 2 It surrounds the pupil, the small opening that lets light enter the eyeball. 3 The choroid is a thin, pigmented layer lining the eyeball that nourishes the retina and the front of the eye with blood. Intraocular melanoma (melanoma of the eye). Intraocular melanoma is the most common type of cancer that dev...</code> | <code>[1, 0, 0, 0, 0, ...]</code> |
254
+ * Loss: [<code>ListNetLoss</code>](https://sbert.net/docs/package_reference/cross_encoder/losses.html#listnetloss) with these parameters:
255
+ ```json
256
+ {
257
+ "activation_fn": "torch.nn.modules.linear.Identity",
258
+ "mini_batch_size": 16
259
+ }
260
+ ```
261
+
262
+ ### Evaluation Dataset
263
+
264
+ #### ms_marco
265
+
266
+ * Dataset: [ms_marco](https://huggingface.co/datasets/microsoft/ms_marco) at [a47ee7a](https://huggingface.co/datasets/microsoft/ms_marco/tree/a47ee7aae8d7d466ba15f9f0bfac3b3681087b3a)
267
+ * Size: 1,000 evaluation samples
268
+ * Columns: <code>query</code>, <code>docs</code>, and <code>labels</code>
269
+ * Approximate statistics based on the first 1000 samples:
270
+ | | query | docs | labels |
271
+ |:--------|:-----------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------|
272
+ | type | string | list | list |
273
+ | details | <ul><li>min: 8 characters</li><li>mean: 34.34 characters</li><li>max: 110 characters</li></ul> | <ul><li>min: 3 elements</li><li>mean: 6.50 elements</li><li>max: 10 elements</li></ul> | <ul><li>min: 3 elements</li><li>mean: 6.50 elements</li><li>max: 10 elements</li></ul> |
274
+ * Samples:
275
+ | query | docs | labels |
276
+ |:-----------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:----------------------------------|
277
+ | <code>what does sertraline treat</code> | <code>['Sertraline is used to treat depression, obsessive compulsive disorder (OCD), panic disorder, premenstrual dysphoric disorder (PMDD), posttraumatic stress disorder (PTSD), and social anxiety disorder (SAD). Sertraline belongs to a group of medicines known as selective serotonin reuptake inhibitors (SSRIs). ', 'Sertraline is used for a number of conditions including: major depression, obsessive-compulsive disorder (OCD), body dysmorphic disorder (BDD), posttraumatic stress disorder (PTSD), premenstrual dysphoric disorder (PMDD), panic disorder and social phobia (social anxiety disorder). It was introduced to the market by Pfizer in 1991. Sertraline is primarily prescribed for major depressive disorder in adult outpatients as well as obsessive-compulsive disorder, panic disorder, and social anxiety disorder, in both adults and children.', 'Zoloft is the brand name of sertraline, an antidepressant used to treat major depressive disorders. Zoloft is in a class of antidepressants known as ...</code> | <code>[1, 0, 0, 0, 0, ...]</code> |
278
+ | <code>can i take just hand luggage and a handbag on thomas cook</code> | <code>["With regard to hand luggage, you can take one bag each max 5kg per person and the bag must not be above a certain size. Your handbag will be counted as your hand luggage bag. The official rules for Thomas Cook Airlines are..... I have checked the paper work and it says we are flying Thomas Cook Airlines All passengers receive a complementary hand baggage allowance of 5kgs when travelling on a Thomas Cook Airline flights. This time it is about luggage. There are 4 of us (2 adults 2 children) when we booked the holiday with Thomas Cook we were told it was 15kg per person for hold luggage and 5kg per person for hand luggage. He rep said we could add the 15kg's together and have 2x 30kg suitcases.", "Advice on hold/hand luggage please! Just me again, with yet ANOTHER question LOL. This time it is about luggage. There are 4 of us (2 adults 2 children) when we booked the holiday with Thomas Cook we were told it was 15kg per person for hold luggage and 5kg per person for hand luggage. He re...</code> | <code>[1, 0, 0, 0, 0, ...]</code> |
279
+ | <code>what causes neisseria gonorrhoeae</code> | <code>['Neisseria gonorrhoeae. by Yen Lemire. Introduction. Neisseria gonorrhoeae is the obligate human pathogen that causes the sexually transmitted disease (STD) gonorrhea. This Gram-negative diplococci/gonococci does not infect other animals or experimental animals and does not survive freely in the environment', 'Neisseria gonorrhoeae, also known as gonococci (plural), or gonococcus (singular), is a species of Gram-negative coffee bean-shaped diplococci bacteria responsible for the sexually transmitted infection gonorrhea. ', 'Gonorrhea is a sexually transmitted disease (STD) that can infect both men and women. It can cause infections in the genitals, rectum, and throat. It is a very common infection, especially among young people ages 15-24 years. ', 'Background. Gonorrhea is a sexually transmitted disease caused by Neisseria gonorrhoeae, a bacterium that can infect areas of the reproductive tract, including the cervix, uterus, and fallopian tubes in women, and the urethra, mouth, throa...</code> | <code>[1, 0, 0, 0, 0, ...]</code> |
280
+ * Loss: [<code>ListNetLoss</code>](https://sbert.net/docs/package_reference/cross_encoder/losses.html#listnetloss) with these parameters:
281
+ ```json
282
+ {
283
+ "activation_fn": "torch.nn.modules.linear.Identity",
284
+ "mini_batch_size": 16
285
+ }
286
+ ```
287
+
288
+ ### Training Hyperparameters
289
+ #### Non-Default Hyperparameters
290
+
291
+ - `eval_strategy`: steps
292
+ - `per_device_train_batch_size`: 16
293
+ - `per_device_eval_batch_size`: 16
294
+ - `learning_rate`: 2e-05
295
+ - `num_train_epochs`: 1
296
+ - `warmup_ratio`: 0.1
297
+ - `seed`: 12
298
+ - `bf16`: True
299
+ - `load_best_model_at_end`: True
300
+
301
+ #### All Hyperparameters
302
+ <details><summary>Click to expand</summary>
303
+
304
+ - `overwrite_output_dir`: False
305
+ - `do_predict`: False
306
+ - `eval_strategy`: steps
307
+ - `prediction_loss_only`: True
308
+ - `per_device_train_batch_size`: 16
309
+ - `per_device_eval_batch_size`: 16
310
+ - `per_gpu_train_batch_size`: None
311
+ - `per_gpu_eval_batch_size`: None
312
+ - `gradient_accumulation_steps`: 1
313
+ - `eval_accumulation_steps`: None
314
+ - `torch_empty_cache_steps`: None
315
+ - `learning_rate`: 2e-05
316
+ - `weight_decay`: 0.0
317
+ - `adam_beta1`: 0.9
318
+ - `adam_beta2`: 0.999
319
+ - `adam_epsilon`: 1e-08
320
+ - `max_grad_norm`: 1.0
321
+ - `num_train_epochs`: 1
322
+ - `max_steps`: -1
323
+ - `lr_scheduler_type`: linear
324
+ - `lr_scheduler_kwargs`: {}
325
+ - `warmup_ratio`: 0.1
326
+ - `warmup_steps`: 0
327
+ - `log_level`: passive
328
+ - `log_level_replica`: warning
329
+ - `log_on_each_node`: True
330
+ - `logging_nan_inf_filter`: True
331
+ - `save_safetensors`: True
332
+ - `save_on_each_node`: False
333
+ - `save_only_model`: False
334
+ - `restore_callback_states_from_checkpoint`: False
335
+ - `no_cuda`: False
336
+ - `use_cpu`: False
337
+ - `use_mps_device`: False
338
+ - `seed`: 12
339
+ - `data_seed`: None
340
+ - `jit_mode_eval`: False
341
+ - `use_ipex`: False
342
+ - `bf16`: True
343
+ - `fp16`: False
344
+ - `fp16_opt_level`: O1
345
+ - `half_precision_backend`: auto
346
+ - `bf16_full_eval`: False
347
+ - `fp16_full_eval`: False
348
+ - `tf32`: None
349
+ - `local_rank`: 0
350
+ - `ddp_backend`: None
351
+ - `tpu_num_cores`: None
352
+ - `tpu_metrics_debug`: False
353
+ - `debug`: []
354
+ - `dataloader_drop_last`: False
355
+ - `dataloader_num_workers`: 0
356
+ - `dataloader_prefetch_factor`: None
357
+ - `past_index`: -1
358
+ - `disable_tqdm`: False
359
+ - `remove_unused_columns`: True
360
+ - `label_names`: None
361
+ - `load_best_model_at_end`: True
362
+ - `ignore_data_skip`: False
363
+ - `fsdp`: []
364
+ - `fsdp_min_num_params`: 0
365
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
366
+ - `fsdp_transformer_layer_cls_to_wrap`: None
367
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
368
+ - `deepspeed`: None
369
+ - `label_smoothing_factor`: 0.0
370
+ - `optim`: adamw_torch
371
+ - `optim_args`: None
372
+ - `adafactor`: False
373
+ - `group_by_length`: False
374
+ - `length_column_name`: length
375
+ - `ddp_find_unused_parameters`: None
376
+ - `ddp_bucket_cap_mb`: None
377
+ - `ddp_broadcast_buffers`: False
378
+ - `dataloader_pin_memory`: True
379
+ - `dataloader_persistent_workers`: False
380
+ - `skip_memory_metrics`: True
381
+ - `use_legacy_prediction_loop`: False
382
+ - `push_to_hub`: False
383
+ - `resume_from_checkpoint`: None
384
+ - `hub_model_id`: None
385
+ - `hub_strategy`: every_save
386
+ - `hub_private_repo`: None
387
+ - `hub_always_push`: False
388
+ - `hub_revision`: None
389
+ - `gradient_checkpointing`: False
390
+ - `gradient_checkpointing_kwargs`: None
391
+ - `include_inputs_for_metrics`: False
392
+ - `include_for_metrics`: []
393
+ - `eval_do_concat_batches`: True
394
+ - `fp16_backend`: auto
395
+ - `push_to_hub_model_id`: None
396
+ - `push_to_hub_organization`: None
397
+ - `mp_parameters`:
398
+ - `auto_find_batch_size`: False
399
+ - `full_determinism`: False
400
+ - `torchdynamo`: None
401
+ - `ray_scope`: last
402
+ - `ddp_timeout`: 1800
403
+ - `torch_compile`: False
404
+ - `torch_compile_backend`: None
405
+ - `torch_compile_mode`: None
406
+ - `include_tokens_per_second`: False
407
+ - `include_num_input_tokens_seen`: False
408
+ - `neftune_noise_alpha`: None
409
+ - `optim_target_modules`: None
410
+ - `batch_eval_metrics`: False
411
+ - `eval_on_start`: False
412
+ - `use_liger_kernel`: False
413
+ - `liger_kernel_config`: None
414
+ - `eval_use_gather_object`: False
415
+ - `average_tokens_across_devices`: False
416
+ - `prompts`: None
417
+ - `batch_sampler`: batch_sampler
418
+ - `multi_dataset_batch_sampler`: proportional
419
+ - `router_mapping`: {}
420
+ - `learning_rate_mapping`: {}
421
+
422
+ </details>
423
+
424
+ ### Training Logs
425
+ | Epoch | Step | Training Loss | Validation Loss | NanoMSMARCO_R100_ndcg@10 | NanoNFCorpus_R100_ndcg@10 | NanoNQ_R100_ndcg@10 | NanoBEIR_R100_mean_ndcg@10 |
426
+ |:----------:|:--------:|:-------------:|:---------------:|:------------------------:|:-------------------------:|:--------------------:|:--------------------------:|
427
+ | -1 | -1 | - | - | 0.0797 (-0.4607) | 0.2817 (-0.0434) | 0.0302 (-0.4704) | 0.1305 (-0.3248) |
428
+ | 0.0002 | 1 | 2.0922 | - | - | - | - | - |
429
+ | 0.0508 | 250 | 2.0905 | - | - | - | - | - |
430
+ | 0.1016 | 500 | 2.0908 | 2.1004 | 0.0181 (-0.5223) | 0.2538 (-0.0713) | 0.0256 (-0.4751) | 0.0991 (-0.3562) |
431
+ | 0.1525 | 750 | 2.0904 | - | - | - | - | - |
432
+ | 0.2033 | 1000 | 2.0849 | 2.1000 | 0.0400 (-0.5004) | 0.2665 (-0.0586) | 0.0176 (-0.4830) | 0.1080 (-0.3474) |
433
+ | 0.2541 | 1250 | 2.0934 | - | - | - | - | - |
434
+ | 0.3049 | 1500 | 2.087 | 2.0992 | 0.0393 (-0.5011) | 0.2181 (-0.1069) | 0.0503 (-0.4504) | 0.1026 (-0.3528) |
435
+ | 0.3558 | 1750 | 2.0929 | - | - | - | - | - |
436
+ | 0.4066 | 2000 | 2.089 | 2.0989 | 0.0447 (-0.4957) | 0.2442 (-0.0808) | 0.0450 (-0.4557) | 0.1113 (-0.3441) |
437
+ | 0.4574 | 2250 | 2.0888 | - | - | - | - | - |
438
+ | 0.5082 | 2500 | 2.0865 | 2.0988 | 0.0393 (-0.5011) | 0.2211 (-0.1040) | 0.0424 (-0.4582) | 0.1009 (-0.3544) |
439
+ | 0.5591 | 2750 | 2.0858 | - | - | - | - | - |
440
+ | 0.6099 | 3000 | 2.0825 | 2.0985 | 0.0447 (-0.4957) | 0.2312 (-0.0938) | 0.0569 (-0.4438) | 0.1109 (-0.3444) |
441
+ | 0.6607 | 3250 | 2.0859 | - | - | - | - | - |
442
+ | 0.7115 | 3500 | 2.0905 | 2.0984 | 0.0447 (-0.4958) | 0.2419 (-0.0831) | 0.0593 (-0.4414) | 0.1153 (-0.3401) |
443
+ | 0.7624 | 3750 | 2.0838 | - | - | - | - | - |
444
+ | 0.8132 | 4000 | 2.0883 | 2.0984 | 0.0605 (-0.4799) | 0.2393 (-0.0858) | 0.0705 (-0.4302) | 0.1234 (-0.3320) |
445
+ | 0.8640 | 4250 | 2.0885 | - | - | - | - | - |
446
+ | **0.9148** | **4500** | **2.0832** | **2.0984** | **0.0574 (-0.4831)** | **0.2485 (-0.0765)** | **0.0648 (-0.4359)** | **0.1235 (-0.3318)** |
447
+ | 0.9656 | 4750 | 2.0815 | - | - | - | - | - |
448
+ | -1 | -1 | - | - | 0.0574 (-0.4831) | 0.2485 (-0.0765) | 0.0648 (-0.4359) | 0.1235 (-0.3318) |
449
+
450
+ * The bold row denotes the saved checkpoint.
451
+
452
+ ### Framework Versions
453
+ - Python: 3.10.18
454
+ - Sentence Transformers: 5.0.0
455
+ - Transformers: 4.56.0.dev0
456
+ - PyTorch: 2.7.1+cu126
457
+ - Accelerate: 1.9.0
458
+ - Datasets: 4.0.0
459
+ - Tokenizers: 0.21.4
460
+
461
+ ## Citation
462
+
463
+ ### BibTeX
464
+
465
+ #### Sentence Transformers
466
+ ```bibtex
467
+ @inproceedings{reimers-2019-sentence-bert,
468
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
469
+ author = "Reimers, Nils and Gurevych, Iryna",
470
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
471
+ month = "11",
472
+ year = "2019",
473
+ publisher = "Association for Computational Linguistics",
474
+ url = "https://arxiv.org/abs/1908.10084",
475
+ }
476
+ ```
477
+
478
+ #### ListNetLoss
479
+ ```bibtex
480
+ @inproceedings{cao2007learning,
481
+ title={Learning to Rank: From Pairwise Approach to Listwise Approach},
482
+ author={Cao, Zhe and Qin, Tao and Liu, Tie-Yan and Tsai, Ming-Feng and Li, Hang},
483
+ booktitle={Proceedings of the 24th international conference on Machine learning},
484
+ pages={129--136},
485
+ year={2007}
486
+ }
487
+ ```
488
+
489
+ <!--
490
+ ## Glossary
491
+
492
+ *Clearly define terms in order to be accessible across audiences.*
493
+ -->
494
+
495
+ <!--
496
+ ## Model Card Authors
497
+
498
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
499
+ -->
500
+
501
+ <!--
502
+ ## Model Card Contact
503
+
504
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
505
+ -->
config.json ADDED
@@ -0,0 +1,34 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "BertForSequenceClassification"
4
+ ],
5
+ "attention_probs_dropout_prob": 0.1,
6
+ "classifier_dropout": null,
7
+ "hidden_act": "gelu",
8
+ "hidden_dropout_prob": 0.1,
9
+ "hidden_size": 256,
10
+ "id2label": {
11
+ "0": "LABEL_0"
12
+ },
13
+ "initializer_range": 0.02,
14
+ "intermediate_size": 1024,
15
+ "label2id": {
16
+ "LABEL_0": 0
17
+ },
18
+ "layer_norm_eps": 1e-12,
19
+ "max_position_embeddings": 512,
20
+ "model_type": "bert",
21
+ "num_attention_heads": 4,
22
+ "num_hidden_layers": 10,
23
+ "pad_token_id": 0,
24
+ "position_embedding_type": "absolute",
25
+ "sentence_transformers": {
26
+ "activation_fn": "torch.nn.modules.activation.Sigmoid",
27
+ "version": "5.0.0"
28
+ },
29
+ "torch_dtype": "float32",
30
+ "transformers_version": "4.56.0.dev0",
31
+ "type_vocab_size": 2,
32
+ "use_cache": true,
33
+ "vocab_size": 30522
34
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:975b6dc833dcae94a9e51d134bbc667aeaa11628b07d51747dd56a7ad54f4cb4
3
+ size 63656924
special_tokens_map.json ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": "[CLS]",
3
+ "mask_token": "[MASK]",
4
+ "pad_token": "[PAD]",
5
+ "sep_token": "[SEP]",
6
+ "unk_token": "[UNK]"
7
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,58 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "[PAD]",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "100": {
12
+ "content": "[UNK]",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "101": {
20
+ "content": "[CLS]",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "102": {
28
+ "content": "[SEP]",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "103": {
36
+ "content": "[MASK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "clean_up_tokenization_spaces": true,
45
+ "cls_token": "[CLS]",
46
+ "do_basic_tokenize": true,
47
+ "do_lower_case": true,
48
+ "extra_special_tokens": {},
49
+ "mask_token": "[MASK]",
50
+ "model_max_length": 512,
51
+ "never_split": null,
52
+ "pad_token": "[PAD]",
53
+ "sep_token": "[SEP]",
54
+ "strip_accents": null,
55
+ "tokenize_chinese_chars": true,
56
+ "tokenizer_class": "BertTokenizer",
57
+ "unk_token": "[UNK]"
58
+ }
vocab.txt ADDED
The diff for this file is too large to render. See raw diff