Update README.md
Browse files
README.md
CHANGED
@@ -1,137 +1,436 @@
|
|
1 |
-
---
|
2 |
-
|
3 |
-
-
|
4 |
-
|
5 |
-
-
|
6 |
-
|
7 |
-
|
8 |
-
|
9 |
-
|
10 |
-
|
11 |
-
|
12 |
-
|
13 |
-
|
14 |
-
|
15 |
-
|
16 |
-
|
17 |
-
-
|
18 |
-
|
19 |
-
-
|
20 |
-
-
|
21 |
-
|
22 |
-
|
23 |
-
|
24 |
-
|
25 |
-
|
26 |
-
|
27 |
-
|
28 |
-
|
29 |
-
|
30 |
-
|
31 |
-
|
32 |
-
|
33 |
-
|
34 |
-
|
35 |
-
|
36 |
-
|
37 |
-
|
38 |
-
|
39 |
-
|
40 |
-
|
41 |
-
|
42 |
-
|
43 |
-
|
44 |
-
|
45 |
-
|
46 |
-
|
47 |
-
|
48 |
-
|
49 |
-
|
50 |
-
|
51 |
-
|
52 |
-
|
53 |
-
|
54 |
-
|
55 |
-
|
56 |
-
|
57 |
-
|
58 |
-
|
59 |
-
|
60 |
-
|
61 |
-
|
62 |
-
|
63 |
-
|
64 |
-
|
65 |
-
]
|
66 |
-
|
67 |
-
|
68 |
-
|
69 |
-
|
70 |
-
|
71 |
-
|
72 |
-
|
73 |
-
|
74 |
-
|
75 |
-
|
76 |
-
|
77 |
-
|
78 |
-
|
79 |
-
|
80 |
-
|
81 |
-
|
82 |
-
|
83 |
-
|
84 |
-
|
85 |
-
|
86 |
-
|
87 |
-
|
88 |
-
|
89 |
-
|
90 |
-
|
91 |
-
|
92 |
-
|
93 |
-
|
94 |
-
|
95 |
-
|
96 |
-
|
97 |
-
|
98 |
-
|
99 |
-
|
100 |
-
|
101 |
-
|
102 |
-
|
103 |
-
|
104 |
-
|
105 |
-
|
106 |
-
|
107 |
-
|
108 |
-
###
|
109 |
-
|
110 |
-
|
111 |
-
|
112 |
-
|
113 |
-
|
114 |
-
|
115 |
-
|
116 |
-
|
117 |
-
|
118 |
-
|
119 |
-
|
120 |
-
|
121 |
-
|
122 |
-
|
123 |
-
|
124 |
-
|
125 |
-
|
126 |
-
|
127 |
-
|
128 |
-
|
129 |
-
|
130 |
-
*
|
131 |
-
|
132 |
-
|
133 |
-
|
134 |
-
|
135 |
-
|
136 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
137 |
-->
|
|
|
1 |
+
---
|
2 |
+
language:
|
3 |
+
- en
|
4 |
+
tags:
|
5 |
+
- sentence-transformers
|
6 |
+
- cross-encoder
|
7 |
+
- text-classification
|
8 |
+
- generated_from_trainer
|
9 |
+
- dataset_size:580740
|
10 |
+
- loss:BinaryCrossEntropyLoss
|
11 |
+
base_model: answerdotai/ModernBERT-base
|
12 |
+
datasets:
|
13 |
+
- sentence-transformers/gooaq
|
14 |
+
pipeline_tag: text-classification
|
15 |
+
library_name: sentence-transformers
|
16 |
+
metrics:
|
17 |
+
- map
|
18 |
+
- mrr@10
|
19 |
+
- ndcg@10
|
20 |
+
model-index:
|
21 |
+
- name: CrossEncoder based on answerdotai/ModernBERT-base
|
22 |
+
results: []
|
23 |
+
---
|
24 |
+
|
25 |
+
# CrossEncoder based on answerdotai/ModernBERT-base
|
26 |
+
|
27 |
+
This is a [Cross Encoder](https://www.sbert.net/docs/cross_encoder/usage/usage.html) model finetuned from [answerdotai/ModernBERT-base](https://huggingface.co/answerdotai/ModernBERT-base) using the [sentence-transformers](https://www.SBERT.net) library. It computes scores for pairs of texts, which can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
|
28 |
+
|
29 |
+
## Model Details
|
30 |
+
|
31 |
+
### Model Description
|
32 |
+
- **Model Type:** Cross Encoder
|
33 |
+
- **Base model:** [answerdotai/ModernBERT-base](https://huggingface.co/answerdotai/ModernBERT-base) <!-- at revision 8949b909ec900327062f0ebf497f51aef5e6f0c8 -->
|
34 |
+
- **Maximum Sequence Length:** 8192 tokens
|
35 |
+
- **Number of Output Labels:** 1 label
|
36 |
+
<!-- - **Training Dataset:** Unknown -->
|
37 |
+
- **Language:** en
|
38 |
+
<!-- - **License:** Unknown -->
|
39 |
+
|
40 |
+
### Model Sources
|
41 |
+
|
42 |
+
- **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
|
43 |
+
- **Documentation:** [Cross Encoder Documentation](https://www.sbert.net/docs/cross_encoder/usage/usage.html)
|
44 |
+
- **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
|
45 |
+
- **Hugging Face:** [Cross Encoders on Hugging Face](https://huggingface.co/models?library=sentence-transformers&other=cross-encoder)
|
46 |
+
|
47 |
+
## Usage
|
48 |
+
|
49 |
+
### Direct Usage (Sentence Transformers)
|
50 |
+
|
51 |
+
First install the Sentence Transformers library:
|
52 |
+
|
53 |
+
```bash
|
54 |
+
pip install -U sentence-transformers
|
55 |
+
```
|
56 |
+
|
57 |
+
Then you can load this model and run inference.
|
58 |
+
```python
|
59 |
+
from sentence_transformers import CrossEncoder
|
60 |
+
|
61 |
+
# Download from the 🤗 Hub
|
62 |
+
model = CrossEncoder("sentence_transformers_model_id")
|
63 |
+
# Get scores for pairs of texts
|
64 |
+
pairs = [
|
65 |
+
['should you take ibuprofen with high blood pressure?', "In general, people with high blood pressure should use acetaminophen or possibly aspirin for over-the-counter pain relief. Unless your health care provider has said it's OK, you should not use ibuprofen, ketoprofen, or naproxen sodium. If aspirin or acetaminophen doesn't help with your pain, call your doctor."],
|
66 |
+
['how old do you have to be to work in sc?', 'The general minimum age of employment for South Carolina youth is 14, although the state allows younger children who are performers to work in show business. If their families are agricultural workers, children younger than age 14 may also participate in farm labor.'],
|
67 |
+
['how to write a topic proposal for a research paper?', "['Write down the main topic of your paper. ... ', 'Write two or three short sentences under the main topic that explain why you chose that topic. ... ', 'Write a thesis sentence that states the angle and purpose of your research paper. ... ', 'List the items you will cover in the body of the paper that support your thesis statement.']"],
|
68 |
+
['how much does aaf pay players?', 'These dates provided an opportunity for players cut at the NFL roster deadline, and each player signed a non-guaranteed three-year contract worth a total of $250,000 ($70,000 in 2019; $80,000 in 2020; $100,000 in 2021), with performance-based and fan-interaction incentives allowing for players to earn more.'],
|
69 |
+
['is jove and zeus the same?', 'Jupiter, or Jove, in Roman mythology is the king of the gods and the god of sky and thunder, equivalent to Zeus in Greek traditions.'],
|
70 |
+
]
|
71 |
+
scores = model.predict(pairs)
|
72 |
+
print(scores.shape)
|
73 |
+
# (5,)
|
74 |
+
|
75 |
+
# Or rank different texts based on similarity to a single text
|
76 |
+
ranks = model.rank(
|
77 |
+
'should you take ibuprofen with high blood pressure?',
|
78 |
+
[
|
79 |
+
"In general, people with high blood pressure should use acetaminophen or possibly aspirin for over-the-counter pain relief. Unless your health care provider has said it's OK, you should not use ibuprofen, ketoprofen, or naproxen sodium. If aspirin or acetaminophen doesn't help with your pain, call your doctor.",
|
80 |
+
'The general minimum age of employment for South Carolina youth is 14, although the state allows younger children who are performers to work in show business. If their families are agricultural workers, children younger than age 14 may also participate in farm labor.',
|
81 |
+
"['Write down the main topic of your paper. ... ', 'Write two or three short sentences under the main topic that explain why you chose that topic. ... ', 'Write a thesis sentence that states the angle and purpose of your research paper. ... ', 'List the items you will cover in the body of the paper that support your thesis statement.']",
|
82 |
+
'These dates provided an opportunity for players cut at the NFL roster deadline, and each player signed a non-guaranteed three-year contract worth a total of $250,000 ($70,000 in 2019; $80,000 in 2020; $100,000 in 2021), with performance-based and fan-interaction incentives allowing for players to earn more.',
|
83 |
+
'Jupiter, or Jove, in Roman mythology is the king of the gods and the god of sky and thunder, equivalent to Zeus in Greek traditions.',
|
84 |
+
]
|
85 |
+
)
|
86 |
+
# [{'corpus_id': ..., 'score': ...}, {'corpus_id': ..., 'score': ...}, ...]
|
87 |
+
```
|
88 |
+
|
89 |
+
<!--
|
90 |
+
### Direct Usage (Transformers)
|
91 |
+
|
92 |
+
<details><summary>Click to see the direct usage in Transformers</summary>
|
93 |
+
|
94 |
+
</details>
|
95 |
+
-->
|
96 |
+
|
97 |
+
<!--
|
98 |
+
### Downstream Usage (Sentence Transformers)
|
99 |
+
|
100 |
+
You can finetune this model on your own dataset.
|
101 |
+
|
102 |
+
<details><summary>Click to expand</summary>
|
103 |
+
|
104 |
+
</details>
|
105 |
+
-->
|
106 |
+
|
107 |
+
<!--
|
108 |
+
### Out-of-Scope Use
|
109 |
+
|
110 |
+
*List how the model may foreseeably be misused and address what users ought not to do with the model.*
|
111 |
+
-->
|
112 |
+
|
113 |
+
## Evaluation
|
114 |
+
|
115 |
+
### Metrics
|
116 |
+
|
117 |
+
#### Cross Encoder Reranking
|
118 |
+
|
119 |
+
* Datasets: `gooaq-dev`, `NanoMSMARCO`, `NanoNFCorpus` and `NanoNQ`
|
120 |
+
* Evaluated with [<code>CERerankingEvaluator</code>](https://sbert.net/docs/package_reference/cross_encoder/evaluation.html#sentence_transformers.cross_encoder.evaluation.CERerankingEvaluator)
|
121 |
+
|
122 |
+
| Metric | gooaq-dev | NanoMSMARCO | NanoNFCorpus | NanoNQ |
|
123 |
+
|:------------|:---------------------|:---------------------|:---------------------|:---------------------|
|
124 |
+
| map | 0.7386 (+0.0063) | 0.5463 (+0.0567) | 0.3300 (+0.0595) | 0.6707 (+0.2500) |
|
125 |
+
| mrr@10 | 0.7360 (+0.0068) | 0.5401 (+0.0626) | 0.5409 (+0.0410) | 0.6737 (+0.2471) |
|
126 |
+
| **ndcg@10** | **0.7880 (+0.0064)** | **0.6203 (+0.0799)** | **0.3660 (+0.0410)** | **0.7246 (+0.2240)** |
|
127 |
+
|
128 |
+
#### Cross Encoder Nano BEIR
|
129 |
+
|
130 |
+
* Dataset: `NanoBEIR_mean`
|
131 |
+
* Evaluated with [<code>CENanoBEIREvaluator</code>](https://sbert.net/docs/package_reference/cross_encoder/evaluation.html#sentence_transformers.cross_encoder.evaluation.CENanoBEIREvaluator)
|
132 |
+
|
133 |
+
| Metric | Value |
|
134 |
+
|:------------|:---------------------|
|
135 |
+
| map | 0.5157 (+0.1221) |
|
136 |
+
| mrr@10 | 0.5849 (+0.1169) |
|
137 |
+
| **ndcg@10** | **0.5703 (+0.1149)** |
|
138 |
+
|
139 |
+
<!--
|
140 |
+
## Bias, Risks and Limitations
|
141 |
+
|
142 |
+
*What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
|
143 |
+
-->
|
144 |
+
|
145 |
+
<!--
|
146 |
+
### Recommendations
|
147 |
+
|
148 |
+
*What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
|
149 |
+
-->
|
150 |
+
|
151 |
+
## Training Details
|
152 |
+
|
153 |
+
### Training Dataset
|
154 |
+
|
155 |
+
#### Unnamed Dataset
|
156 |
+
|
157 |
+
* Size: 580,740 training samples
|
158 |
+
* Columns: <code>query</code>, <code>response</code>, and <code>label</code>
|
159 |
+
* Approximate statistics based on the first 1000 samples:
|
160 |
+
| | query | response | label |
|
161 |
+
|:--------|:----------------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------------------|:-----------------------------|
|
162 |
+
| type | string | string | int |
|
163 |
+
| details | <ul><li>min: 17 characters</li><li>mean: 42.5 characters</li><li>max: 91 characters</li></ul> | <ul><li>min: 51 characters</li><li>mean: 253.83 characters</li><li>max: 385 characters</li></ul> | <ul><li>1: 100.00%</li></ul> |
|
164 |
+
* Samples:
|
165 |
+
| query | response | label |
|
166 |
+
|:----------------------------------------------------------------------------------|:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:---------------|
|
167 |
+
| <code>what is the difference between a certificate and associate's degree?</code> | <code>Certificate degrees are extremely focused in their objective(s) and are related to a specific job or career niche. ... Certificates are often obtained as an add-on to an associate degree. Associate degree programs require two years of full-time classroom attendance in order to complete a degree.</code> | <code>1</code> |
|
168 |
+
| <code>what is the difference between 5star and inverter ac?</code> | <code>An inverter AC works on variable speed compressor whereas a 5-star rated non-inverter AC have single speed compressor. It changes its speed as per the heat load and number of people. The need of Stabilizer: A stabilizer is installed with the AC to maintain an optimum voltage range during the power fluctuations.</code> | <code>1</code> |
|
169 |
+
| <code>what is the difference between gas and electric cars?</code> | <code>A gas-powered car has a fuel tank, which supplies gasoline to the engine. The engine then turns a transmission, which turns the wheels. Move your mouse over the parts for a 3-D view. An electric car, on the other hand, has a set of batteries that provides electricity to an electric motor.</code> | <code>1</code> |
|
170 |
+
* Loss: [<code>BinaryCrossEntropyLoss</code>](https://sbert.net/docs/package_reference/cross_encoder/losses.html#binarycrossentropyloss) with these parameters:
|
171 |
+
```json
|
172 |
+
{
|
173 |
+
"activation_fct": "torch.nn.modules.linear.Identity",
|
174 |
+
"pos_weight": 5
|
175 |
+
}
|
176 |
+
```
|
177 |
+
|
178 |
+
### Evaluation Dataset
|
179 |
+
|
180 |
+
#### gooaq
|
181 |
+
|
182 |
+
* Dataset: [gooaq](https://huggingface.co/datasets/sentence-transformers/gooaq) at [b089f72](https://huggingface.co/datasets/sentence-transformers/gooaq/tree/b089f728748a068b7bc5234e5bcf5b25e3c8279c)
|
183 |
+
* Size: 3,012,496 evaluation samples
|
184 |
+
* Columns: <code>query</code>, <code>response</code>, and <code>label</code>
|
185 |
+
* Approximate statistics based on the first 1000 samples:
|
186 |
+
| | query | response | label |
|
187 |
+
|:--------|:-----------------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------------------|:-----------------------------|
|
188 |
+
| type | string | string | int |
|
189 |
+
| details | <ul><li>min: 18 characters</li><li>mean: 43.05 characters</li><li>max: 88 characters</li></ul> | <ul><li>min: 51 characters</li><li>mean: 252.39 characters</li><li>max: 386 characters</li></ul> | <ul><li>1: 100.00%</li></ul> |
|
190 |
+
* Samples:
|
191 |
+
| query | response | label |
|
192 |
+
|:-----------------------------------------------------------------|:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:---------------|
|
193 |
+
| <code>should you take ibuprofen with high blood pressure?</code> | <code>In general, people with high blood pressure should use acetaminophen or possibly aspirin for over-the-counter pain relief. Unless your health care provider has said it's OK, you should not use ibuprofen, ketoprofen, or naproxen sodium. If aspirin or acetaminophen doesn't help with your pain, call your doctor.</code> | <code>1</code> |
|
194 |
+
| <code>how old do you have to be to work in sc?</code> | <code>The general minimum age of employment for South Carolina youth is 14, although the state allows younger children who are performers to work in show business. If their families are agricultural workers, children younger than age 14 may also participate in farm labor.</code> | <code>1</code> |
|
195 |
+
| <code>how to write a topic proposal for a research paper?</code> | <code>['Write down the main topic of your paper. ... ', 'Write two or three short sentences under the main topic that explain why you chose that topic. ... ', 'Write a thesis sentence that states the angle and purpose of your research paper. ... ', 'List the items you will cover in the body of the paper that support your thesis statement.']</code> | <code>1</code> |
|
196 |
+
* Loss: [<code>BinaryCrossEntropyLoss</code>](https://sbert.net/docs/package_reference/cross_encoder/losses.html#binarycrossentropyloss) with these parameters:
|
197 |
+
```json
|
198 |
+
{
|
199 |
+
"activation_fct": "torch.nn.modules.linear.Identity",
|
200 |
+
"pos_weight": 5
|
201 |
+
}
|
202 |
+
```
|
203 |
+
|
204 |
+
### Training Hyperparameters
|
205 |
+
#### Non-Default Hyperparameters
|
206 |
+
|
207 |
+
- `eval_strategy`: steps
|
208 |
+
- `per_device_train_batch_size`: 64
|
209 |
+
- `per_device_eval_batch_size`: 64
|
210 |
+
- `learning_rate`: 2e-05
|
211 |
+
- `num_train_epochs`: 1
|
212 |
+
- `warmup_ratio`: 0.1
|
213 |
+
- `seed`: 12
|
214 |
+
- `bf16`: True
|
215 |
+
- `dataloader_num_workers`: 4
|
216 |
+
- `load_best_model_at_end`: True
|
217 |
+
- `batch_sampler`: no_duplicates
|
218 |
+
|
219 |
+
#### All Hyperparameters
|
220 |
+
<details><summary>Click to expand</summary>
|
221 |
+
|
222 |
+
- `overwrite_output_dir`: False
|
223 |
+
- `do_predict`: False
|
224 |
+
- `eval_strategy`: steps
|
225 |
+
- `prediction_loss_only`: True
|
226 |
+
- `per_device_train_batch_size`: 64
|
227 |
+
- `per_device_eval_batch_size`: 64
|
228 |
+
- `per_gpu_train_batch_size`: None
|
229 |
+
- `per_gpu_eval_batch_size`: None
|
230 |
+
- `gradient_accumulation_steps`: 1
|
231 |
+
- `eval_accumulation_steps`: None
|
232 |
+
- `torch_empty_cache_steps`: None
|
233 |
+
- `learning_rate`: 2e-05
|
234 |
+
- `weight_decay`: 0.0
|
235 |
+
- `adam_beta1`: 0.9
|
236 |
+
- `adam_beta2`: 0.999
|
237 |
+
- `adam_epsilon`: 1e-08
|
238 |
+
- `max_grad_norm`: 1.0
|
239 |
+
- `num_train_epochs`: 1
|
240 |
+
- `max_steps`: -1
|
241 |
+
- `lr_scheduler_type`: linear
|
242 |
+
- `lr_scheduler_kwargs`: {}
|
243 |
+
- `warmup_ratio`: 0.1
|
244 |
+
- `warmup_steps`: 0
|
245 |
+
- `log_level`: passive
|
246 |
+
- `log_level_replica`: warning
|
247 |
+
- `log_on_each_node`: True
|
248 |
+
- `logging_nan_inf_filter`: True
|
249 |
+
- `save_safetensors`: True
|
250 |
+
- `save_on_each_node`: False
|
251 |
+
- `save_only_model`: False
|
252 |
+
- `restore_callback_states_from_checkpoint`: False
|
253 |
+
- `no_cuda`: False
|
254 |
+
- `use_cpu`: False
|
255 |
+
- `use_mps_device`: False
|
256 |
+
- `seed`: 12
|
257 |
+
- `data_seed`: None
|
258 |
+
- `jit_mode_eval`: False
|
259 |
+
- `use_ipex`: False
|
260 |
+
- `bf16`: True
|
261 |
+
- `fp16`: False
|
262 |
+
- `fp16_opt_level`: O1
|
263 |
+
- `half_precision_backend`: auto
|
264 |
+
- `bf16_full_eval`: False
|
265 |
+
- `fp16_full_eval`: False
|
266 |
+
- `tf32`: None
|
267 |
+
- `local_rank`: 0
|
268 |
+
- `ddp_backend`: None
|
269 |
+
- `tpu_num_cores`: None
|
270 |
+
- `tpu_metrics_debug`: False
|
271 |
+
- `debug`: []
|
272 |
+
- `dataloader_drop_last`: False
|
273 |
+
- `dataloader_num_workers`: 4
|
274 |
+
- `dataloader_prefetch_factor`: None
|
275 |
+
- `past_index`: -1
|
276 |
+
- `disable_tqdm`: False
|
277 |
+
- `remove_unused_columns`: True
|
278 |
+
- `label_names`: None
|
279 |
+
- `load_best_model_at_end`: True
|
280 |
+
- `ignore_data_skip`: False
|
281 |
+
- `fsdp`: []
|
282 |
+
- `fsdp_min_num_params`: 0
|
283 |
+
- `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
|
284 |
+
- `fsdp_transformer_layer_cls_to_wrap`: None
|
285 |
+
- `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
|
286 |
+
- `deepspeed`: None
|
287 |
+
- `label_smoothing_factor`: 0.0
|
288 |
+
- `optim`: adamw_torch
|
289 |
+
- `optim_args`: None
|
290 |
+
- `adafactor`: False
|
291 |
+
- `group_by_length`: False
|
292 |
+
- `length_column_name`: length
|
293 |
+
- `ddp_find_unused_parameters`: None
|
294 |
+
- `ddp_bucket_cap_mb`: None
|
295 |
+
- `ddp_broadcast_buffers`: False
|
296 |
+
- `dataloader_pin_memory`: True
|
297 |
+
- `dataloader_persistent_workers`: False
|
298 |
+
- `skip_memory_metrics`: True
|
299 |
+
- `use_legacy_prediction_loop`: False
|
300 |
+
- `push_to_hub`: False
|
301 |
+
- `resume_from_checkpoint`: None
|
302 |
+
- `hub_model_id`: None
|
303 |
+
- `hub_strategy`: every_save
|
304 |
+
- `hub_private_repo`: None
|
305 |
+
- `hub_always_push`: False
|
306 |
+
- `gradient_checkpointing`: False
|
307 |
+
- `gradient_checkpointing_kwargs`: None
|
308 |
+
- `include_inputs_for_metrics`: False
|
309 |
+
- `include_for_metrics`: []
|
310 |
+
- `eval_do_concat_batches`: True
|
311 |
+
- `fp16_backend`: auto
|
312 |
+
- `push_to_hub_model_id`: None
|
313 |
+
- `push_to_hub_organization`: None
|
314 |
+
- `mp_parameters`:
|
315 |
+
- `auto_find_batch_size`: False
|
316 |
+
- `full_determinism`: False
|
317 |
+
- `torchdynamo`: None
|
318 |
+
- `ray_scope`: last
|
319 |
+
- `ddp_timeout`: 1800
|
320 |
+
- `torch_compile`: False
|
321 |
+
- `torch_compile_backend`: None
|
322 |
+
- `torch_compile_mode`: None
|
323 |
+
- `dispatch_batches`: None
|
324 |
+
- `split_batches`: None
|
325 |
+
- `include_tokens_per_second`: False
|
326 |
+
- `include_num_input_tokens_seen`: False
|
327 |
+
- `neftune_noise_alpha`: None
|
328 |
+
- `optim_target_modules`: None
|
329 |
+
- `batch_eval_metrics`: False
|
330 |
+
- `eval_on_start`: False
|
331 |
+
- `use_liger_kernel`: False
|
332 |
+
- `eval_use_gather_object`: False
|
333 |
+
- `average_tokens_across_devices`: False
|
334 |
+
- `prompts`: None
|
335 |
+
- `batch_sampler`: no_duplicates
|
336 |
+
- `multi_dataset_batch_sampler`: proportional
|
337 |
+
|
338 |
+
</details>
|
339 |
+
|
340 |
+
### Training Logs
|
341 |
+
| Epoch | Step | Training Loss | Validation Loss | gooaq-dev_ndcg@10 | NanoMSMARCO_ndcg@10 | NanoNFCorpus_ndcg@10 | NanoNQ_ndcg@10 | NanoBEIR_mean_ndcg@10 |
|
342 |
+
|:----------:|:--------:|:-------------:|:---------------:|:--------------------:|:--------------------:|:--------------------:|:--------------------:|:---------------------:|
|
343 |
+
| -1 | -1 | - | - | 0.1879 (-0.5937) | 0.0748 (-0.4656) | 0.2012 (-0.1238) | 0.0414 (-0.4592) | 0.1058 (-0.3496) |
|
344 |
+
| 0.0001 | 1 | 1.1971 | - | - | - | - | - | - |
|
345 |
+
| 0.0220 | 200 | 1.1557 | - | - | - | - | - | - |
|
346 |
+
| 0.0441 | 400 | 0.9119 | - | - | - | - | - | - |
|
347 |
+
| 0.0661 | 600 | 0.5124 | - | - | - | - | - | - |
|
348 |
+
| 0.0882 | 800 | 0.4225 | - | - | - | - | - | - |
|
349 |
+
| 0.1102 | 1000 | 0.3876 | 1.3811 | 0.7192 (-0.0624) | 0.5171 (-0.0233) | 0.3438 (+0.0187) | 0.5647 (+0.0641) | 0.4752 (+0.0198) |
|
350 |
+
| 0.1322 | 1200 | 0.3563 | - | - | - | - | - | - |
|
351 |
+
| 0.1543 | 1400 | 0.3155 | - | - | - | - | - | - |
|
352 |
+
| 0.1763 | 1600 | 0.3181 | - | - | - | - | - | - |
|
353 |
+
| 0.1983 | 1800 | 0.289 | - | - | - | - | - | - |
|
354 |
+
| 0.2204 | 2000 | 0.283 | 0.6710 | 0.7528 (-0.0289) | 0.5559 (+0.0155) | 0.3445 (+0.0194) | 0.6592 (+0.1585) | 0.5198 (+0.0645) |
|
355 |
+
| 0.2424 | 2200 | 0.2745 | - | - | - | - | - | - |
|
356 |
+
| 0.2645 | 2400 | 0.2575 | - | - | - | - | - | - |
|
357 |
+
| 0.2865 | 2600 | 0.2762 | - | - | - | - | - | - |
|
358 |
+
| 0.3085 | 2800 | 0.2489 | - | - | - | - | - | - |
|
359 |
+
| 0.3306 | 3000 | 0.2259 | 0.7575 | 0.7696 (-0.0121) | 0.4982 (-0.0422) | 0.3555 (+0.0305) | 0.6483 (+0.1476) | 0.5007 (+0.0453) |
|
360 |
+
| 0.3526 | 3200 | 0.2576 | - | - | - | - | - | - |
|
361 |
+
| 0.3747 | 3400 | 0.2384 | - | - | - | - | - | - |
|
362 |
+
| 0.3967 | 3600 | 0.2431 | - | - | - | - | - | - |
|
363 |
+
| 0.4187 | 3800 | 0.206 | - | - | - | - | - | - |
|
364 |
+
| 0.4408 | 4000 | 0.2381 | 0.9594 | 0.7774 (-0.0042) | 0.5649 (+0.0245) | 0.3666 (+0.0416) | 0.6842 (+0.1836) | 0.5386 (+0.0832) |
|
365 |
+
| 0.4628 | 4200 | 0.2196 | - | - | - | - | - | - |
|
366 |
+
| 0.4848 | 4400 | 0.2153 | - | - | - | - | - | - |
|
367 |
+
| 0.5069 | 4600 | 0.217 | - | - | - | - | - | - |
|
368 |
+
| 0.5289 | 4800 | 0.1982 | - | - | - | - | - | - |
|
369 |
+
| 0.5510 | 5000 | 0.2172 | 0.6249 | 0.7864 (+0.0047) | 0.6029 (+0.0625) | 0.3833 (+0.0583) | 0.7029 (+0.2022) | 0.5630 (+0.1077) |
|
370 |
+
| 0.5730 | 5200 | 0.2145 | - | - | - | - | - | - |
|
371 |
+
| 0.5950 | 5400 | 0.213 | - | - | - | - | - | - |
|
372 |
+
| 0.6171 | 5600 | 0.2117 | - | - | - | - | - | - |
|
373 |
+
| 0.6391 | 5800 | 0.2102 | - | - | - | - | - | - |
|
374 |
+
| 0.6612 | 6000 | 0.2125 | 0.7420 | 0.7834 (+0.0017) | 0.5907 (+0.0503) | 0.3771 (+0.0521) | 0.7176 (+0.2169) | 0.5618 (+0.1064) |
|
375 |
+
| 0.6832 | 6200 | 0.1995 | - | - | - | - | - | - |
|
376 |
+
| 0.7052 | 6400 | 0.1978 | - | - | - | - | - | - |
|
377 |
+
| 0.7273 | 6600 | 0.1857 | - | - | - | - | - | - |
|
378 |
+
| 0.7493 | 6800 | 0.1811 | - | - | - | - | - | - |
|
379 |
+
| 0.7713 | 7000 | 0.2055 | 1.1528 | 0.7827 (+0.0011) | 0.6152 (+0.0748) | 0.3730 (+0.0480) | 0.7190 (+0.2184) | 0.5691 (+0.1137) |
|
380 |
+
| 0.7934 | 7200 | 0.1855 | - | - | - | - | - | - |
|
381 |
+
| 0.8154 | 7400 | 0.1829 | - | - | - | - | - | - |
|
382 |
+
| 0.8375 | 7600 | 0.1901 | - | - | - | - | - | - |
|
383 |
+
| 0.8595 | 7800 | 0.1862 | - | - | - | - | - | - |
|
384 |
+
| **0.8815** | **8000** | **0.1858** | **0.6424** | **0.7880 (+0.0064)** | **0.6203 (+0.0799)** | **0.3660 (+0.0410)** | **0.7246 (+0.2240)** | **0.5703 (+0.1149)** |
|
385 |
+
| 0.9036 | 8200 | 0.1545 | - | - | - | - | - | - |
|
386 |
+
| 0.9256 | 8400 | 0.1729 | - | - | - | - | - | - |
|
387 |
+
| 0.9477 | 8600 | 0.1657 | - | - | - | - | - | - |
|
388 |
+
| 0.9697 | 8800 | 0.1698 | - | - | - | - | - | - |
|
389 |
+
| 0.9917 | 9000 | 0.1658 | 0.6904 | 0.7898 (+0.0081) | 0.6011 (+0.0606) | 0.3612 (+0.0361) | 0.7165 (+0.2159) | 0.5596 (+0.1042) |
|
390 |
+
| -1 | -1 | - | - | 0.7880 (+0.0064) | 0.6203 (+0.0799) | 0.3660 (+0.0410) | 0.7246 (+0.2240) | 0.5703 (+0.1149) |
|
391 |
+
|
392 |
+
* The bold row denotes the saved checkpoint.
|
393 |
+
|
394 |
+
### Framework Versions
|
395 |
+
- Python: 3.11.10
|
396 |
+
- Sentence Transformers: 3.5.0.dev0
|
397 |
+
- Transformers: 4.49.0.dev0
|
398 |
+
- PyTorch: 2.6.0.dev20241112+cu121
|
399 |
+
- Accelerate: 1.2.0
|
400 |
+
- Datasets: 3.2.0
|
401 |
+
- Tokenizers: 0.21.0
|
402 |
+
|
403 |
+
## Citation
|
404 |
+
|
405 |
+
### BibTeX
|
406 |
+
|
407 |
+
#### Sentence Transformers
|
408 |
+
```bibtex
|
409 |
+
@inproceedings{reimers-2019-sentence-bert,
|
410 |
+
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
|
411 |
+
author = "Reimers, Nils and Gurevych, Iryna",
|
412 |
+
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
|
413 |
+
month = "11",
|
414 |
+
year = "2019",
|
415 |
+
publisher = "Association for Computational Linguistics",
|
416 |
+
url = "https://arxiv.org/abs/1908.10084",
|
417 |
+
}
|
418 |
+
```
|
419 |
+
|
420 |
+
<!--
|
421 |
+
## Glossary
|
422 |
+
|
423 |
+
*Clearly define terms in order to be accessible across audiences.*
|
424 |
+
-->
|
425 |
+
|
426 |
+
<!--
|
427 |
+
## Model Card Authors
|
428 |
+
|
429 |
+
*Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
|
430 |
+
-->
|
431 |
+
|
432 |
+
<!--
|
433 |
+
## Model Card Contact
|
434 |
+
|
435 |
+
*Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
|
436 |
-->
|