Add MTEB metrics
Browse files
README.md
CHANGED
|
@@ -182,7 +182,7 @@ Then you can load this model and run inference.
|
|
| 182 |
from sentence_transformers import SparseEncoder
|
| 183 |
|
| 184 |
# Download from the 🤗 Hub
|
| 185 |
-
model = SparseEncoder("
|
| 186 |
# Run inference
|
| 187 |
queries = [
|
| 188 |
"hoe maak je een keldervloer glad",
|
|
@@ -231,6 +231,52 @@ You can finetune this model on your own dataset.
|
|
| 231 |
|
| 232 |
### Metrics
|
| 233 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 234 |
#### Sparse Information Retrieval
|
| 235 |
|
| 236 |
* Dataset: `msmarco-eval-1k`
|
|
|
|
| 182 |
from sentence_transformers import SparseEncoder
|
| 183 |
|
| 184 |
# Download from the 🤗 Hub
|
| 185 |
+
model = SparseEncoder("sparse-encoder/splade-robbert-dutch-base-v1")
|
| 186 |
# Run inference
|
| 187 |
queries = [
|
| 188 |
"hoe maak je een keldervloer glad",
|
|
|
|
| 231 |
|
| 232 |
### Metrics
|
| 233 |
|
| 234 |
+
#### MTEB
|
| 235 |
+
|
| 236 |
+
To evaluate this model, we've evaluated it on [BelebeleRetrieval](https://arxiv.org/abs/2308.16884) and WikipediaRetrievalMultilingual: the two Dutch Retrieval tasks recommended by [MMTEB](https://huggingface.co/spaces/mteb/leaderboard).
|
| 237 |
+
|
| 238 |
+

|
| 239 |
+
|
| 240 |
+
As shown in this figure, `splade-robbert-dutch-base-v1` heavily outperforms the only other Dutch-capable Sparse embedding model, and outperforms all equally sized dense embedding models, despite only using an average of ~250 active (non-zero) dimensions for documents (during training).
|
| 241 |
+
|
| 242 |
+
<details><summary>Click to see the full table</summary>
|
| 243 |
+
|
| 244 |
+
| Model | Number of Parameters | BelebeleRetrieval | WikipediaRetrievalMultilingual |
|
| 245 |
+
|---------------------------------------------------|----------------------|-------------------|--------------------------------|
|
| 246 |
+
| multilingual-e5-large-instruct | 560M | 94.725 | 92.342 |
|
| 247 |
+
| multilingual-e5-large | 560M | 94.607 | - |
|
| 248 |
+
| Solon-embeddings-large-0.1 | 559M | 93.651 | 91.239 |
|
| 249 |
+
| snowflake-arctic-embed-l-v2.0 | 568M | 93.318 | 90.902 |
|
| 250 |
+
| bge-m3 | 568M | 93.859 | 90.106 |
|
| 251 |
+
| multilingual-e5-base | 278M | 93.731 | 89.905 |
|
| 252 |
+
| jina-embeddings-v3 | 572M | 93.105 | 90.296 |
|
| 253 |
+
| **splade-robbert-dutch-base-v1** | 124M | 93.389 | 88.937 |
|
| 254 |
+
| multilingual-e5-small | 118M | 92.859 | 88.662 |
|
| 255 |
+
| KaLM-embedding-multilingual-mini-v1 | 494M | 91.453 | 88.413 |
|
| 256 |
+
| Qwen3-Embedding-0.6B | 595M | 91.686 | 88.121 |
|
| 257 |
+
| snowflake-arctic-embed-m-v2.0 | 305M | 88.358 | 88.898 |
|
| 258 |
+
| granite-embedding-278m-multilingual | 278M | 87.039 | 86.324 |
|
| 259 |
+
| gte-multilingual-base | 305M | 89.204 | 83.976 |
|
| 260 |
+
| KaLM-embedding-multilingual-mini-instruct-v1 | 494M | 85.648 | 85.877 |
|
| 261 |
+
| granite-embedding-107m-multilingual | 107M | 85.068 | 85.097 |
|
| 262 |
+
| robbert-2022-dutch-sentence-transformers | 124M | 86.146 | 82.553 |
|
| 263 |
+
| opensearch-neural-sparse-encoding-multilingual-v1 | 167M | 80.101 | 85.529 |
|
| 264 |
+
| paraphrase-multilingual-mpnet-base-v2 | 278M | 83.910 | 76.676 |
|
| 265 |
+
| e5-large-v2 | 335M | 76.433 | 79.711 |
|
| 266 |
+
| STS-multilingual-mpnet-base-v2 | 278M | 80.625 | 73.803 |
|
| 267 |
+
| paraphrase-multilingual-MiniLM-L12-v2 | 118M | 81.021 | 71.091 |
|
| 268 |
+
| snowflake-arctic-embed-m | 109M | 65.511 | 74.801 |
|
| 269 |
+
| potion-multilingual-128M | 128M | 72.454 | 65.559 |
|
| 270 |
+
| static-similarity-mrl-multilingual-v1 | 108M | 67.375 | 69.050 |
|
| 271 |
+
| snowflake-arctic-embed-m-long | 137M | 67.947 | 65.988 |
|
| 272 |
+
| snowflake-arctic-embed-m-v1.5 | 109M | 65.511 | 67.920 |
|
| 273 |
+
| bge-base-en-v1.5 | 109M | 61.073 | 72.093 |
|
| 274 |
+
| snowflake-arctic-embed-s | 32M | 58.683 | 70.887 |
|
| 275 |
+
| potion-base-8M | 7M | 22.563 | 40.107 |
|
| 276 |
+
|
| 277 |
+
</details>
|
| 278 |
+
|
| 279 |
+
|
| 280 |
#### Sparse Information Retrieval
|
| 281 |
|
| 282 |
* Dataset: `msmarco-eval-1k`
|