BM-K commited on
Commit
b56ed0d
·
verified ·
1 Parent(s): c137cd5

Initial commit

Browse files
1_SpladePooling/config.json ADDED
@@ -0,0 +1,5 @@
 
 
 
 
 
 
1
+ {
2
+ "pooling_strategy": "max",
3
+ "activation_function": "relu",
4
+ "word_embedding_dimension": 50000
5
+ }
README.md CHANGED
@@ -1,5 +1,406 @@
1
  ---
2
- license: apache-2.0
3
- language:
4
- - ko
5
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ tags:
3
+ - sentence-transformers
4
+ - sparse-encoder
5
+ - sparse
6
+ - splade
7
+ - generated_from_trainer
8
+ - dataset_size:1112040
9
+ - loss:SpladeLoss
10
+ - loss:SparseMultipleNegativesRankingLoss
11
+ - loss:FlopsLoss
12
+ widget:
13
+ - text: 매크로 (명사). 복잡한 입력을 컴퓨터 프로그램에 대해 비교적 인간 친화적으로 줄인 표현. 전처리기는 컴파일되기 전에 모든 내장된 매크로를
14
+ 소스 코드로 확장한다.
15
+ - text: "브레네 호수 \n브레네 호수는 스위스 보주주 조 계곡에 위치한 호수입니다. 이 호수는 조 호수의 북쪽에 있으며, 단 200미터 떨어져\
16
+ \ 있습니다. 해발 1002미터로 조 호수보다 2미터 낮습니다."
17
+ - text: 그 앨범 "Making Lite of Myself"를 만든 코미디언의 국적은 무엇인가요?
18
+ - text: 비어 있음의 의미는 무엇인가요?
19
+ - text: '파트라데비(콘카니어: 포트라데오)는 고아의 페르넴 탈루크에 위치한 마을로, 고아와 마하라슈트라 경계에 있습니다. 이 마을에는 파트라데비
20
+ 검문소가 위치해 있습니다.'
21
+ pipeline_tag: feature-extraction
22
+ library_name: sentence-transformers
23
+ ---
24
+
25
+ # SPLADE Sparse Encoder
26
+
27
+ This is a [SPLADE Sparse Encoder](https://www.sbert.net/docs/sparse_encoder/usage/usage.html) model trained on the json dataset using the [sentence-transformers](https://www.SBERT.net) library. It maps sentences & paragraphs to a 50000-dimensional sparse vector space and can be used for semantic search and sparse retrieval.
28
+ ## Model Details
29
+
30
+ ### Model Description
31
+ - **Model Type:** SPLADE Sparse Encoder
32
+ <!-- - **Base model:** [Unknown](https://huggingface.co/unknown) -->
33
+ - **Maximum Sequence Length:** 3072 tokens
34
+ - **Output Dimensionality:** 50000 dimensions
35
+ - **Similarity Function:** Dot Product
36
+ - **Training Dataset:**
37
+ - json
38
+ <!-- - **Language:** Unknown -->
39
+ <!-- - **License:** Unknown -->
40
+
41
+ ### Model Sources
42
+
43
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
44
+ - **Documentation:** [Sparse Encoder Documentation](https://www.sbert.net/docs/sparse_encoder/usage/usage.html)
45
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
46
+ - **Hugging Face:** [Sparse Encoders on Hugging Face](https://huggingface.co/models?library=sentence-transformers&other=sparse-encoder)
47
+
48
+ ### Full Model Architecture
49
+
50
+ ```
51
+ SparseEncoder(
52
+ (0): MLMTransformer({'max_seq_length': 3072, 'do_lower_case': False, 'architecture': 'ModernBertForMaskedLM'})
53
+ (1): SpladePooling({'pooling_strategy': 'max', 'activation_function': 'relu', 'word_embedding_dimension': 50000})
54
+ )
55
+ ```
56
+
57
+ ## Usage
58
+
59
+ ### Direct Usage (Sentence Transformers)
60
+
61
+ First install the Sentence Transformers library:
62
+
63
+ ```bash
64
+ pip install -U sentence-transformers
65
+ ```
66
+
67
+ Then you can load this model and run inference.
68
+ ```python
69
+ from sentence_transformers import SparseEncoder
70
+
71
+ # Download from the 🤗 Hub
72
+ model = SparseEncoder("sparse_encoder_model_id")
73
+ # Run inference
74
+ sentences = [
75
+ '파트라데비는 고아의 페르넴 타룩에 위치한 마을로, 고아는 어느 나라에 있는 주인가요?',
76
+ '파트라데비(콘카니어: 포트라데오)는 고아의 페르넴 탈루크에 위치한 마을로, 고아와 마하라슈트라 경계에 있습니다. 이 마을에는 파트라데비 검문소가 위치해 있습니다.',
77
+ '콘디바데 A.m 콘디바데 A.m은 인도의 한 마을입니다. 이 마을은 마하라슈트라 주의 푸네 지구 마왈 탈루카에 위치해 있습니다.',
78
+ ]
79
+ embeddings = model.encode(sentences)
80
+ print(embeddings.shape)
81
+ # [3, 50000]
82
+
83
+ # Get the similarity scores for the embeddings
84
+ similarities = model.similarity(embeddings, embeddings)
85
+ print(similarities)
86
+ # tensor([[25.1626, 27.0573, 7.1256],
87
+ # [27.0573, 84.2966, 31.7376],
88
+ # [ 7.1256, 31.7376, 74.3025]])
89
+ ```
90
+
91
+ <!--
92
+ ### Direct Usage (Transformers)
93
+
94
+ <details><summary>Click to see the direct usage in Transformers</summary>
95
+
96
+ </details>
97
+ -->
98
+
99
+ <!--
100
+ ### Downstream Usage (Sentence Transformers)
101
+
102
+ You can finetune this model on your own dataset.
103
+
104
+ <details><summary>Click to expand</summary>
105
+
106
+ </details>
107
+ -->
108
+
109
+ <!--
110
+ ### Out-of-Scope Use
111
+
112
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
113
+ -->
114
+
115
+ <!--
116
+ ## Bias, Risks and Limitations
117
+
118
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
119
+ -->
120
+
121
+ <!--
122
+ ### Recommendations
123
+
124
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
125
+ -->
126
+
127
+ ## Training Details
128
+
129
+ ### Training Dataset
130
+
131
+ #### json
132
+
133
+ * Dataset: json
134
+ * Size: 1,112,040 training samples
135
+ * Columns: <code>anchor</code>, <code>positive</code>, <code>negative_1</code>, <code>negative_2</code>, and <code>negative_3</code>
136
+ * Approximate statistics based on the first 1000 samples:
137
+ | | anchor | positive | negative_1 | negative_2 | negative_3 |
138
+ |:--------|:----------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|
139
+ | type | string | string | string | string | string |
140
+ | details | <ul><li>min: 3 tokens</li><li>mean: 18.8 tokens</li><li>max: 126 tokens</li></ul> | <ul><li>min: 15 tokens</li><li>mean: 50.36 tokens</li><li>max: 77 tokens</li></ul> | <ul><li>min: 16 tokens</li><li>mean: 47.98 tokens</li><li>max: 73 tokens</li></ul> | <ul><li>min: 9 tokens</li><li>mean: 47.69 tokens</li><li>max: 79 tokens</li></ul> | <ul><li>min: 7 tokens</li><li>mean: 47.96 tokens</li><li>max: 78 tokens</li></ul> |
141
+ * Samples:
142
+ | anchor | positive | negative_1 | negative_2 | negative_3 |
143
+ |:-------------------------------------------------------------------|:-------------------------------------------------------|:--------------------------------------------------------------|:---------------------------------------------------------------|:----------------------------------------------------------------|
144
+ | <code>난촨구와 둥촨구는 어느 나라에 위치해 있습니까?</code> | <code>난촨구(南川区)는 중국 충칭의 구이자 이전의 현이다.</code> | <code>남풍현(南丰县)은 중국 장시성(江西省) 푸저우(福州)에 위치한 군이다.</code> | <code>도교, 광둥 도교(道滘)는 중국 남부 광둥성 동관 시의 관할 하에 있는 도시입니다.</code> | <code>동포구 동포구는 중국 쓰촨성의 구역입니다. 이곳은 메이산시의 관할 하에 있습니다.</code> |
145
+ | <code>가짜대나무(Pseudosasa)와 별꽃(Cerastium)은 모두 자생 식물과 관련이 있습니까?</code> | <code>가짜사사(Pseudosasa)는 풀과에 속하는 동아시아 대나무의 속입니다.</code> | <code>세팔로소루스(Cephalosorus)는 데이지 과에 속하는 꽃이 피는 식물의 속입니다.</code> | <code>가짜기생충속(Pseudoparasitus)은 라엘라피다에 속하는 진드기의 속입니다.</code> | <code>페리타사(Peritassa)는 쐐기풀과(Celastraceae) 식물의 속입니다.</code> |
146
+ | <code>그저우와 헤이룽장성 동닝은 어떤 나라와 접경하고 있습니까?</code> | <code>허주(贺州)는 중화인민공화국 광시 좡족 자치구 북동부에 위치한 지급시이다.</code> | <code>지관구(지관구)는 중국 인민공화국 헤이룽장성 지시시의 구이자 시청 소재지입니다.</code> | <code>헤동 가도(河东街道)는 중국 광시(广西) 리우저우(柳州) 청중 구(城中区)의 가도입니다.</code> | <code>화닝현 (华宁县; 병음: Huáníng Xiàn)은 중국 윈난성 유시시에 위치해 있습니다.</code> |
147
+ * Loss: [<code>SpladeLoss</code>](https://sbert.net/docs/package_reference/sparse_encoder/losses.html#spladeloss) with these parameters:
148
+ ```json
149
+ {
150
+ "loss": "SparseMultipleNegativesRankingLoss(scale=1.0, similarity_fct='dot_score')",
151
+ "document_regularizer_weight": 3e-05,
152
+ "query_regularizer_weight": 5e-05
153
+ }
154
+ ```
155
+
156
+ ### Training Hyperparameters
157
+ #### Non-Default Hyperparameters
158
+
159
+ - `per_device_train_batch_size`: 6
160
+ - `gradient_accumulation_steps`: 4
161
+ - `learning_rate`: 2e-06
162
+ - `warmup_ratio`: 0.1
163
+ - `bf16`: True
164
+ - `ddp_find_unused_parameters`: True
165
+ - `ddp_timeout`: 7200
166
+ - `batch_sampler`: no_duplicates
167
+
168
+ #### All Hyperparameters
169
+ <details><summary>Click to expand</summary>
170
+
171
+ - `overwrite_output_dir`: False
172
+ - `do_predict`: False
173
+ - `eval_strategy`: no
174
+ - `prediction_loss_only`: True
175
+ - `per_device_train_batch_size`: 6
176
+ - `per_device_eval_batch_size`: 8
177
+ - `per_gpu_train_batch_size`: None
178
+ - `per_gpu_eval_batch_size`: None
179
+ - `gradient_accumulation_steps`: 4
180
+ - `eval_accumulation_steps`: None
181
+ - `torch_empty_cache_steps`: None
182
+ - `learning_rate`: 2e-06
183
+ - `weight_decay`: 0.0
184
+ - `adam_beta1`: 0.9
185
+ - `adam_beta2`: 0.999
186
+ - `adam_epsilon`: 1e-08
187
+ - `max_grad_norm`: 1.0
188
+ - `num_train_epochs`: 3
189
+ - `max_steps`: -1
190
+ - `lr_scheduler_type`: linear
191
+ - `lr_scheduler_kwargs`: {}
192
+ - `warmup_ratio`: 0.1
193
+ - `warmup_steps`: 0
194
+ - `log_level`: passive
195
+ - `log_level_replica`: warning
196
+ - `log_on_each_node`: True
197
+ - `logging_nan_inf_filter`: True
198
+ - `save_safetensors`: True
199
+ - `save_on_each_node`: False
200
+ - `save_only_model`: False
201
+ - `restore_callback_states_from_checkpoint`: False
202
+ - `no_cuda`: False
203
+ - `use_cpu`: False
204
+ - `use_mps_device`: False
205
+ - `seed`: 42
206
+ - `data_seed`: None
207
+ - `jit_mode_eval`: False
208
+ - `use_ipex`: False
209
+ - `bf16`: True
210
+ - `fp16`: False
211
+ - `fp16_opt_level`: O1
212
+ - `half_precision_backend`: auto
213
+ - `bf16_full_eval`: False
214
+ - `fp16_full_eval`: False
215
+ - `tf32`: None
216
+ - `local_rank`: 2
217
+ - `ddp_backend`: None
218
+ - `tpu_num_cores`: None
219
+ - `tpu_metrics_debug`: False
220
+ - `debug`: []
221
+ - `dataloader_drop_last`: True
222
+ - `dataloader_num_workers`: 0
223
+ - `dataloader_prefetch_factor`: None
224
+ - `past_index`: -1
225
+ - `disable_tqdm`: False
226
+ - `remove_unused_columns`: True
227
+ - `label_names`: None
228
+ - `load_best_model_at_end`: False
229
+ - `ignore_data_skip`: False
230
+ - `fsdp`: []
231
+ - `fsdp_min_num_params`: 0
232
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
233
+ - `tp_size`: 0
234
+ - `fsdp_transformer_layer_cls_to_wrap`: None
235
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
236
+ - `deepspeed`: None
237
+ - `label_smoothing_factor`: 0.0
238
+ - `optim`: adamw_torch
239
+ - `optim_args`: None
240
+ - `adafactor`: False
241
+ - `group_by_length`: False
242
+ - `length_column_name`: length
243
+ - `ddp_find_unused_parameters`: True
244
+ - `ddp_bucket_cap_mb`: None
245
+ - `ddp_broadcast_buffers`: False
246
+ - `dataloader_pin_memory`: True
247
+ - `dataloader_persistent_workers`: False
248
+ - `skip_memory_metrics`: True
249
+ - `use_legacy_prediction_loop`: False
250
+ - `push_to_hub`: False
251
+ - `resume_from_checkpoint`: None
252
+ - `hub_model_id`: None
253
+ - `hub_strategy`: every_save
254
+ - `hub_private_repo`: None
255
+ - `hub_always_push`: False
256
+ - `gradient_checkpointing`: False
257
+ - `gradient_checkpointing_kwargs`: None
258
+ - `include_inputs_for_metrics`: False
259
+ - `include_for_metrics`: []
260
+ - `eval_do_concat_batches`: True
261
+ - `fp16_backend`: auto
262
+ - `push_to_hub_model_id`: None
263
+ - `push_to_hub_organization`: None
264
+ - `mp_parameters`:
265
+ - `auto_find_batch_size`: False
266
+ - `full_determinism`: False
267
+ - `torchdynamo`: None
268
+ - `ray_scope`: last
269
+ - `ddp_timeout`: 7200
270
+ - `torch_compile`: False
271
+ - `torch_compile_backend`: None
272
+ - `torch_compile_mode`: None
273
+ - `include_tokens_per_second`: False
274
+ - `include_num_input_tokens_seen`: False
275
+ - `neftune_noise_alpha`: None
276
+ - `optim_target_modules`: None
277
+ - `batch_eval_metrics`: False
278
+ - `eval_on_start`: False
279
+ - `use_liger_kernel`: False
280
+ - `eval_use_gather_object`: False
281
+ - `average_tokens_across_devices`: False
282
+ - `prompts`: None
283
+ - `batch_sampler`: no_duplicates
284
+ - `multi_dataset_batch_sampler`: proportional
285
+ - `router_mapping`: {}
286
+ - `learning_rate_mapping`: {}
287
+
288
+ </details>
289
+
290
+ ### Training Logs
291
+ | Epoch | Step | Training Loss |
292
+ |:------:|:-----:|:-------------:|
293
+ | 0.0863 | 1000 | 4.8919 |
294
+ | 0.1727 | 2000 | 3.4433 |
295
+ | 0.2590 | 3000 | 3.1294 |
296
+ | 0.3453 | 4000 | 2.9256 |
297
+ | 0.4316 | 5000 | 2.8705 |
298
+ | 0.5180 | 6000 | 2.2949 |
299
+ | 0.6043 | 7000 | 1.451 |
300
+ | 0.6906 | 8000 | 1.1573 |
301
+ | 0.7770 | 9000 | 1.0298 |
302
+ | 0.8633 | 10000 | 1.1008 |
303
+ | 0.9496 | 11000 | 1.3943 |
304
+ | 1.0360 | 12000 | 2.1922 |
305
+ | 1.1223 | 13000 | 2.6991 |
306
+ | 1.2087 | 14000 | 2.4977 |
307
+ | 1.2950 | 15000 | 2.448 |
308
+ | 1.3813 | 16000 | 2.4044 |
309
+ | 1.4676 | 17000 | 2.3224 |
310
+ | 1.5540 | 18000 | 1.4636 |
311
+ | 1.6403 | 19000 | 1.0056 |
312
+ | 1.7266 | 20000 | 0.8397 |
313
+ | 1.8129 | 21000 | 0.8211 |
314
+ | 1.8993 | 22000 | 0.9905 |
315
+ | 1.9856 | 23000 | 1.3015 |
316
+ | 2.0720 | 24000 | 2.3987 |
317
+ | 2.1583 | 25000 | 2.3067 |
318
+ | 2.2447 | 26000 | 2.2579 |
319
+ | 2.3310 | 27000 | 2.2134 |
320
+ | 2.4173 | 28000 | 2.2357 |
321
+ | 2.5036 | 29000 | 1.867 |
322
+ | 2.5900 | 30000 | 1.0632 |
323
+ | 2.6763 | 31000 | 0.8168 |
324
+ | 2.7626 | 32000 | 0.7357 |
325
+ | 2.8489 | 33000 | 0.7851 |
326
+ | 2.9353 | 34000 | 1.0681 |
327
+
328
+
329
+ ### Framework Versions
330
+ - Python: 3.11.12
331
+ - Sentence Transformers: 5.0.0
332
+ - Transformers: 4.51.3
333
+ - PyTorch: 2.7.0+cu128
334
+ - Accelerate: 1.5.2
335
+ - Datasets: 2.21.0
336
+ - Tokenizers: 0.21.1
337
+
338
+ ## Citation
339
+
340
+ ### BibTeX
341
+
342
+ #### Sentence Transformers
343
+ ```bibtex
344
+ @inproceedings{reimers-2019-sentence-bert,
345
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
346
+ author = "Reimers, Nils and Gurevych, Iryna",
347
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
348
+ month = "11",
349
+ year = "2019",
350
+ publisher = "Association for Computational Linguistics",
351
+ url = "https://arxiv.org/abs/1908.10084",
352
+ }
353
+ ```
354
+
355
+ #### SpladeLoss
356
+ ```bibtex
357
+ @misc{formal2022distillationhardnegativesampling,
358
+ title={From Distillation to Hard Negative Sampling: Making Sparse Neural IR Models More Effective},
359
+ author={Thibault Formal and Carlos Lassance and Benjamin Piwowarski and Stéphane Clinchant},
360
+ year={2022},
361
+ eprint={2205.04733},
362
+ archivePrefix={arXiv},
363
+ primaryClass={cs.IR},
364
+ url={https://arxiv.org/abs/2205.04733},
365
+ }
366
+ ```
367
+
368
+ #### SparseMultipleNegativesRankingLoss
369
+ ```bibtex
370
+ @misc{henderson2017efficient,
371
+ title={Efficient Natural Language Response Suggestion for Smart Reply},
372
+ author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
373
+ year={2017},
374
+ eprint={1705.00652},
375
+ archivePrefix={arXiv},
376
+ primaryClass={cs.CL}
377
+ }
378
+ ```
379
+
380
+ #### FlopsLoss
381
+ ```bibtex
382
+ @article{paria2020minimizing,
383
+ title={Minimizing flops to learn efficient sparse representations},
384
+ author={Paria, Biswajit and Yeh, Chih-Kuan and Yen, Ian EH and Xu, Ning and Ravikumar, Pradeep and P{'o}czos, Barnab{'a}s},
385
+ journal={arXiv preprint arXiv:2004.05665},
386
+ year={2020}
387
+ }
388
+ ```
389
+
390
+ <!--
391
+ ## Glossary
392
+
393
+ *Clearly define terms in order to be accessible across audiences.*
394
+ -->
395
+
396
+ <!--
397
+ ## Model Card Authors
398
+
399
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
400
+ -->
401
+
402
+ <!--
403
+ ## Model Card Contact
404
+
405
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
406
+ -->
added_tokens.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ {
2
+ "<pad>": 49999
3
+ }
config.json ADDED
@@ -0,0 +1,45 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "ModernBertForMaskedLM"
4
+ ],
5
+ "attention_bias": false,
6
+ "attention_dropout": 0.0,
7
+ "bos_token_id": 0,
8
+ "classifier_activation": "gelu",
9
+ "classifier_bias": false,
10
+ "classifier_dropout": 0.0,
11
+ "classifier_pooling": "mean",
12
+ "cls_token_id": 0,
13
+ "decoder_bias": true,
14
+ "deterministic_flash_attn": false,
15
+ "embedding_dropout": 0.0,
16
+ "eos_token_id": 1,
17
+ "global_attn_every_n_layers": 3,
18
+ "global_rope_theta": 160000,
19
+ "gradient_checkpointing": false,
20
+ "hidden_activation": "gelu",
21
+ "hidden_size": 768,
22
+ "initializer_cutoff_factor": 2.0,
23
+ "initializer_range": 0.02,
24
+ "intermediate_size": 1152,
25
+ "layer_norm_eps": 1e-05,
26
+ "local_attention": 128,
27
+ "local_rope_theta": 10000.0,
28
+ "max_position_embeddings": 16384,
29
+ "mlp_bias": false,
30
+ "mlp_dropout": 0.0,
31
+ "model_type": "modernbert",
32
+ "norm_bias": false,
33
+ "norm_eps": 1e-05,
34
+ "num_attention_heads": 12,
35
+ "num_hidden_layers": 22,
36
+ "pad_token_id": 49999,
37
+ "position_embedding_type": "absolute",
38
+ "repad_logits_with_grad": false,
39
+ "sep_token_id": 1,
40
+ "sparse_pred_ignore_index": -100,
41
+ "sparse_prediction": false,
42
+ "torch_dtype": "float32",
43
+ "transformers_version": "4.51.3",
44
+ "vocab_size": 50000
45
+ }
config_sentence_transformers.json ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "model_type": "SparseEncoder",
3
+ "__version__": {
4
+ "sentence_transformers": "5.0.0",
5
+ "transformers": "4.51.3",
6
+ "pytorch": "2.7.0+cu128"
7
+ },
8
+ "prompts": {
9
+ "query": "",
10
+ "document": ""
11
+ },
12
+ "default_prompt_name": null,
13
+ "similarity_fn_name": "dot"
14
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1869e2c80d885e73805f28977890ad697066463853883524bf7406bcdc5827e6
3
+ size 597503064
modules.json ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.sparse_encoder.models.MLMTransformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_SpladePooling",
12
+ "type": "sentence_transformers.sparse_encoder.models.SpladePooling"
13
+ }
14
+ ]
sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 8192,
3
+ "do_lower_case": false
4
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,51 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": {
3
+ "content": "<s>",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "cls_token": {
10
+ "content": "<cls>",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "eos_token": {
17
+ "content": "<\\s>",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "mask_token": {
24
+ "content": "<mask>",
25
+ "lstrip": false,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ },
30
+ "pad_token": {
31
+ "content": "<pad>",
32
+ "lstrip": false,
33
+ "normalized": false,
34
+ "rstrip": false,
35
+ "single_word": false
36
+ },
37
+ "sep_token": {
38
+ "content": "<sep>",
39
+ "lstrip": false,
40
+ "normalized": false,
41
+ "rstrip": false,
42
+ "single_word": false
43
+ },
44
+ "unk_token": {
45
+ "content": "<unk>",
46
+ "lstrip": false,
47
+ "normalized": false,
48
+ "rstrip": false,
49
+ "single_word": false
50
+ }
51
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,329 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "<s>",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "1": {
12
+ "content": "<\\s>",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "2": {
20
+ "content": "<unk>",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "3": {
28
+ "content": "<sep>",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "4": {
36
+ "content": "<mask>",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ },
43
+ "5": {
44
+ "content": "<cls>",
45
+ "lstrip": false,
46
+ "normalized": false,
47
+ "rstrip": false,
48
+ "single_word": false,
49
+ "special": true
50
+ },
51
+ "6": {
52
+ "content": "<unused0>",
53
+ "lstrip": false,
54
+ "normalized": false,
55
+ "rstrip": false,
56
+ "single_word": false,
57
+ "special": true
58
+ },
59
+ "7": {
60
+ "content": "<unused1>",
61
+ "lstrip": false,
62
+ "normalized": false,
63
+ "rstrip": false,
64
+ "single_word": false,
65
+ "special": true
66
+ },
67
+ "8": {
68
+ "content": "<unused2>",
69
+ "lstrip": false,
70
+ "normalized": false,
71
+ "rstrip": false,
72
+ "single_word": false,
73
+ "special": true
74
+ },
75
+ "9": {
76
+ "content": "<unused3>",
77
+ "lstrip": false,
78
+ "normalized": false,
79
+ "rstrip": false,
80
+ "single_word": false,
81
+ "special": true
82
+ },
83
+ "10": {
84
+ "content": "<unused4>",
85
+ "lstrip": false,
86
+ "normalized": false,
87
+ "rstrip": false,
88
+ "single_word": false,
89
+ "special": true
90
+ },
91
+ "11": {
92
+ "content": "<unused5>",
93
+ "lstrip": false,
94
+ "normalized": false,
95
+ "rstrip": false,
96
+ "single_word": false,
97
+ "special": true
98
+ },
99
+ "12": {
100
+ "content": "<unused6>",
101
+ "lstrip": false,
102
+ "normalized": false,
103
+ "rstrip": false,
104
+ "single_word": false,
105
+ "special": true
106
+ },
107
+ "13": {
108
+ "content": "<unused7>",
109
+ "lstrip": false,
110
+ "normalized": false,
111
+ "rstrip": false,
112
+ "single_word": false,
113
+ "special": true
114
+ },
115
+ "14": {
116
+ "content": "<unused8>",
117
+ "lstrip": false,
118
+ "normalized": false,
119
+ "rstrip": false,
120
+ "single_word": false,
121
+ "special": true
122
+ },
123
+ "15": {
124
+ "content": "<unused9>",
125
+ "lstrip": false,
126
+ "normalized": false,
127
+ "rstrip": false,
128
+ "single_word": false,
129
+ "special": true
130
+ },
131
+ "16": {
132
+ "content": "<unused10>",
133
+ "lstrip": false,
134
+ "normalized": false,
135
+ "rstrip": false,
136
+ "single_word": false,
137
+ "special": true
138
+ },
139
+ "17": {
140
+ "content": "<unused11>",
141
+ "lstrip": false,
142
+ "normalized": false,
143
+ "rstrip": false,
144
+ "single_word": false,
145
+ "special": true
146
+ },
147
+ "18": {
148
+ "content": "<unused12>",
149
+ "lstrip": false,
150
+ "normalized": false,
151
+ "rstrip": false,
152
+ "single_word": false,
153
+ "special": true
154
+ },
155
+ "19": {
156
+ "content": "<unused13>",
157
+ "lstrip": false,
158
+ "normalized": false,
159
+ "rstrip": false,
160
+ "single_word": false,
161
+ "special": true
162
+ },
163
+ "20": {
164
+ "content": "<unused14>",
165
+ "lstrip": false,
166
+ "normalized": false,
167
+ "rstrip": false,
168
+ "single_word": false,
169
+ "special": true
170
+ },
171
+ "21": {
172
+ "content": "<unused15>",
173
+ "lstrip": false,
174
+ "normalized": false,
175
+ "rstrip": false,
176
+ "single_word": false,
177
+ "special": true
178
+ },
179
+ "22": {
180
+ "content": "<unused16>",
181
+ "lstrip": false,
182
+ "normalized": false,
183
+ "rstrip": false,
184
+ "single_word": false,
185
+ "special": true
186
+ },
187
+ "23": {
188
+ "content": "<unused17>",
189
+ "lstrip": false,
190
+ "normalized": false,
191
+ "rstrip": false,
192
+ "single_word": false,
193
+ "special": true
194
+ },
195
+ "24": {
196
+ "content": "<unused18>",
197
+ "lstrip": false,
198
+ "normalized": false,
199
+ "rstrip": false,
200
+ "single_word": false,
201
+ "special": true
202
+ },
203
+ "25": {
204
+ "content": "<unused19>",
205
+ "lstrip": false,
206
+ "normalized": false,
207
+ "rstrip": false,
208
+ "single_word": false,
209
+ "special": true
210
+ },
211
+ "26": {
212
+ "content": "<unused20>",
213
+ "lstrip": false,
214
+ "normalized": false,
215
+ "rstrip": false,
216
+ "single_word": false,
217
+ "special": true
218
+ },
219
+ "27": {
220
+ "content": "<unused21>",
221
+ "lstrip": false,
222
+ "normalized": false,
223
+ "rstrip": false,
224
+ "single_word": false,
225
+ "special": true
226
+ },
227
+ "28": {
228
+ "content": "<unused22>",
229
+ "lstrip": false,
230
+ "normalized": false,
231
+ "rstrip": false,
232
+ "single_word": false,
233
+ "special": true
234
+ },
235
+ "29": {
236
+ "content": "<unused23>",
237
+ "lstrip": false,
238
+ "normalized": false,
239
+ "rstrip": false,
240
+ "single_word": false,
241
+ "special": true
242
+ },
243
+ "30": {
244
+ "content": "<unused24>",
245
+ "lstrip": false,
246
+ "normalized": false,
247
+ "rstrip": false,
248
+ "single_word": false,
249
+ "special": true
250
+ },
251
+ "31": {
252
+ "content": "<unused25>",
253
+ "lstrip": false,
254
+ "normalized": false,
255
+ "rstrip": false,
256
+ "single_word": false,
257
+ "special": true
258
+ },
259
+ "32": {
260
+ "content": "<unused26>",
261
+ "lstrip": false,
262
+ "normalized": false,
263
+ "rstrip": false,
264
+ "single_word": false,
265
+ "special": true
266
+ },
267
+ "33": {
268
+ "content": "<unused27>",
269
+ "lstrip": false,
270
+ "normalized": false,
271
+ "rstrip": false,
272
+ "single_word": false,
273
+ "special": true
274
+ },
275
+ "34": {
276
+ "content": "<unused28>",
277
+ "lstrip": false,
278
+ "normalized": false,
279
+ "rstrip": false,
280
+ "single_word": false,
281
+ "special": true
282
+ },
283
+ "35": {
284
+ "content": "<unused29>",
285
+ "lstrip": false,
286
+ "normalized": false,
287
+ "rstrip": false,
288
+ "single_word": false,
289
+ "special": true
290
+ },
291
+ "36": {
292
+ "content": "<unused30>",
293
+ "lstrip": false,
294
+ "normalized": false,
295
+ "rstrip": false,
296
+ "single_word": false,
297
+ "special": true
298
+ },
299
+ "49999": {
300
+ "content": "<pad>",
301
+ "lstrip": false,
302
+ "normalized": false,
303
+ "rstrip": false,
304
+ "single_word": false,
305
+ "special": true
306
+ }
307
+ },
308
+ "bos_token": "<s>",
309
+ "clean_up_tokenization_spaces": true,
310
+ "cls_token": "<cls>",
311
+ "do_lower_case": false,
312
+ "eos_token": "<\\s>",
313
+ "extra_special_tokens": {},
314
+ "mask_token": "<mask>",
315
+ "max_length": 2048,
316
+ "model_max_length": 8192,
317
+ "pad_to_multiple_of": null,
318
+ "pad_token": "<pad>",
319
+ "pad_token_type_id": 0,
320
+ "padding_side": "right",
321
+ "sep_token": "<sep>",
322
+ "stride": 0,
323
+ "strip_accents": null,
324
+ "tokenize_chinese_chars": true,
325
+ "tokenizer_class": "BertTokenizer",
326
+ "truncation_side": "right",
327
+ "truncation_strategy": "longest_first",
328
+ "unk_token": "<unk>"
329
+ }
vocab.txt ADDED
The diff for this file is too large to render. See raw diff