Add `text-embeddings-inference` tag & snippet
#8
by
alvarobartt
HF Staff
- opened
README.md
CHANGED
@@ -7,6 +7,7 @@ tags:
|
|
7 |
- feature-extraction
|
8 |
- sentence-similarity
|
9 |
- transformers
|
|
|
10 |
datasets:
|
11 |
- flax-sentence-embeddings/stackexchange_xml
|
12 |
- ms_marco
|
@@ -41,23 +42,23 @@ from sentence_transformers import SentenceTransformer, util
|
|
41 |
query = "How many people live in London?"
|
42 |
docs = ["Around 9 Million people live in London", "London is known for its financial district"]
|
43 |
|
44 |
-
#Load the model
|
45 |
model = SentenceTransformer('sentence-transformers/multi-qa-mpnet-base-dot-v1')
|
46 |
|
47 |
-
#Encode query and documents
|
48 |
query_emb = model.encode(query)
|
49 |
doc_emb = model.encode(docs)
|
50 |
|
51 |
-
#Compute dot score between query and all document embeddings
|
52 |
scores = util.dot_score(query_emb, doc_emb)[0].cpu().tolist()
|
53 |
|
54 |
-
#Combine docs & scores
|
55 |
doc_score_pairs = list(zip(docs, scores))
|
56 |
|
57 |
-
#Sort by decreasing score
|
58 |
doc_score_pairs = sorted(doc_score_pairs, key=lambda x: x[1], reverse=True)
|
59 |
|
60 |
-
#Output passages & scores
|
61 |
for doc, score in doc_score_pairs:
|
62 |
print(score, doc)
|
63 |
```
|
@@ -70,11 +71,11 @@ Without [sentence-transformers](https://www.SBERT.net), you can use the model li
|
|
70 |
from transformers import AutoTokenizer, AutoModel
|
71 |
import torch
|
72 |
|
73 |
-
#CLS Pooling - Take output from first token
|
74 |
def cls_pooling(model_output):
|
75 |
return model_output.last_hidden_state[:,0]
|
76 |
|
77 |
-
#Encode text
|
78 |
def encode(texts):
|
79 |
# Tokenize sentences
|
80 |
encoded_input = tokenizer(texts, padding=True, truncation=True, return_tensors='pt')
|
@@ -97,24 +98,58 @@ docs = ["Around 9 Million people live in London", "London is known for its finan
|
|
97 |
tokenizer = AutoTokenizer.from_pretrained("sentence-transformers/multi-qa-mpnet-base-dot-v1")
|
98 |
model = AutoModel.from_pretrained("sentence-transformers/multi-qa-mpnet-base-dot-v1")
|
99 |
|
100 |
-
#Encode query and docs
|
101 |
query_emb = encode(query)
|
102 |
doc_emb = encode(docs)
|
103 |
|
104 |
-
#Compute dot score between query and all document embeddings
|
105 |
scores = torch.mm(query_emb, doc_emb.transpose(0, 1))[0].cpu().tolist()
|
106 |
|
107 |
-
#Combine docs & scores
|
108 |
doc_score_pairs = list(zip(docs, scores))
|
109 |
|
110 |
-
#Sort by decreasing score
|
111 |
doc_score_pairs = sorted(doc_score_pairs, key=lambda x: x[1], reverse=True)
|
112 |
|
113 |
-
#Output passages & scores
|
114 |
for doc, score in doc_score_pairs:
|
115 |
print(score, doc)
|
116 |
```
|
117 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
118 |
## Technical Details
|
119 |
|
120 |
In the following some technical details how this model must be used:
|
@@ -128,25 +163,22 @@ In the following some technical details how this model must be used:
|
|
128 |
|
129 |
----
|
130 |
|
131 |
-
|
132 |
## Background
|
133 |
|
134 |
The project aims to train sentence embedding models on very large sentence level datasets using a self-supervised
|
135 |
contrastive learning objective. We use a contrastive learning objective: given a sentence from the pair, the model should predict which out of a set of randomly sampled other sentences, was actually paired with it in our dataset.
|
136 |
|
137 |
-
We
|
138 |
[Community week using JAX/Flax for NLP & CV](https://discuss.huggingface.co/t/open-to-the-community-community-week-using-jax-flax-for-nlp-cv/7104),
|
139 |
-
organized by Hugging Face. We
|
140 |
-
[Train the Best Sentence Embedding Model Ever with 1B Training Pairs](https://discuss.huggingface.co/t/train-the-best-sentence-embedding-model-ever-with-1b-training-pairs/7354). We benefited from efficient hardware infrastructure to run the project: 7 TPUs v3-8, as well as intervention from
|
141 |
|
142 |
## Intended uses
|
143 |
|
144 |
-
Our model is
|
145 |
|
146 |
Note that there is a limit of 512 word pieces: Text longer than that will be truncated. Further note that the model was just trained on input text up to 250 word pieces. It might not work well for longer text.
|
147 |
|
148 |
-
|
149 |
-
|
150 |
## Training procedure
|
151 |
|
152 |
The full training script is accessible in this current repository: `train_script.py`.
|
@@ -162,9 +194,6 @@ We sampled each dataset given a weighted probability which configuration is deta
|
|
162 |
|
163 |
The model was trained with [MultipleNegativesRankingLoss](https://www.sbert.net/docs/package_reference/losses.html#multiplenegativesrankingloss) using CLS-pooling, dot-product as similarity function, and a scale of 1.
|
164 |
|
165 |
-
|
166 |
-
|
167 |
-
|
168 |
| Dataset | Number of training tuples |
|
169 |
|--------------------------------------------------------|:--------------------------:|
|
170 |
| [WikiAnswers](https://github.com/afader/oqa#wikianswers-corpus) Duplicate question pairs from WikiAnswers | 77,427,422 |
|
|
|
7 |
- feature-extraction
|
8 |
- sentence-similarity
|
9 |
- transformers
|
10 |
+
- text-embeddings-inference
|
11 |
datasets:
|
12 |
- flax-sentence-embeddings/stackexchange_xml
|
13 |
- ms_marco
|
|
|
42 |
query = "How many people live in London?"
|
43 |
docs = ["Around 9 Million people live in London", "London is known for its financial district"]
|
44 |
|
45 |
+
# Load the model
|
46 |
model = SentenceTransformer('sentence-transformers/multi-qa-mpnet-base-dot-v1')
|
47 |
|
48 |
+
# Encode query and documents
|
49 |
query_emb = model.encode(query)
|
50 |
doc_emb = model.encode(docs)
|
51 |
|
52 |
+
# Compute dot score between query and all document embeddings
|
53 |
scores = util.dot_score(query_emb, doc_emb)[0].cpu().tolist()
|
54 |
|
55 |
+
# Combine docs & scores
|
56 |
doc_score_pairs = list(zip(docs, scores))
|
57 |
|
58 |
+
# Sort by decreasing score
|
59 |
doc_score_pairs = sorted(doc_score_pairs, key=lambda x: x[1], reverse=True)
|
60 |
|
61 |
+
# Output passages & scores
|
62 |
for doc, score in doc_score_pairs:
|
63 |
print(score, doc)
|
64 |
```
|
|
|
71 |
from transformers import AutoTokenizer, AutoModel
|
72 |
import torch
|
73 |
|
74 |
+
# CLS Pooling - Take output from first token
|
75 |
def cls_pooling(model_output):
|
76 |
return model_output.last_hidden_state[:,0]
|
77 |
|
78 |
+
# Encode text
|
79 |
def encode(texts):
|
80 |
# Tokenize sentences
|
81 |
encoded_input = tokenizer(texts, padding=True, truncation=True, return_tensors='pt')
|
|
|
98 |
tokenizer = AutoTokenizer.from_pretrained("sentence-transformers/multi-qa-mpnet-base-dot-v1")
|
99 |
model = AutoModel.from_pretrained("sentence-transformers/multi-qa-mpnet-base-dot-v1")
|
100 |
|
101 |
+
# Encode query and docs
|
102 |
query_emb = encode(query)
|
103 |
doc_emb = encode(docs)
|
104 |
|
105 |
+
# Compute dot score between query and all document embeddings
|
106 |
scores = torch.mm(query_emb, doc_emb.transpose(0, 1))[0].cpu().tolist()
|
107 |
|
108 |
+
# Combine docs & scores
|
109 |
doc_score_pairs = list(zip(docs, scores))
|
110 |
|
111 |
+
# Sort by decreasing score
|
112 |
doc_score_pairs = sorted(doc_score_pairs, key=lambda x: x[1], reverse=True)
|
113 |
|
114 |
+
# Output passages & scores
|
115 |
for doc, score in doc_score_pairs:
|
116 |
print(score, doc)
|
117 |
```
|
118 |
|
119 |
+
## Usage (Text Embeddings Inference (TEI))
|
120 |
+
|
121 |
+
[Text Embeddings Inference (TEI)](https://github.com/huggingface/text-embeddings-inference) is a blazing fast inference solution for text embedding models.
|
122 |
+
|
123 |
+
- CPU:
|
124 |
+
```bash
|
125 |
+
docker run -p 8080:80 -v hf_cache:/data --pull always ghcr.io/huggingface/text-embeddings-inference:cpu-latest \
|
126 |
+
--model-id sentence-transformers/multi-qa-mpnet-base-dot-v1 \
|
127 |
+
--pooling cls \
|
128 |
+
--dtype float16
|
129 |
+
```
|
130 |
+
|
131 |
+
- NVIDIA GPU:
|
132 |
+
```bash
|
133 |
+
docker run --gpus all -p 8080:80 -v hf_cache:/data --pull always ghcr.io/huggingface/text-embeddings-inference:cuda-latest \
|
134 |
+
--model-id sentence-transformers/multi-qa-mpnet-base-dot-v1 \
|
135 |
+
--pooling cls \
|
136 |
+
--dtype float16
|
137 |
+
```
|
138 |
+
|
139 |
+
Send a request to `/v1/embeddings` to generate embeddings via the [OpenAI Embeddings API](https://platform.openai.com/docs/api-reference/embeddings/create):
|
140 |
+
```bash
|
141 |
+
curl http://localhost:8080/v1/embeddings \
|
142 |
+
-H "Content-Type: application/json" \
|
143 |
+
-d '{
|
144 |
+
"model": "sentence-transformers/multi-qa-mpnet-base-dot-v1",
|
145 |
+
"input": "How many people live in London?"
|
146 |
+
}'
|
147 |
+
```
|
148 |
+
|
149 |
+
Or check the [Text Embeddings Inference API specification](https://huggingface.github.io/text-embeddings-inference/) instead.
|
150 |
+
|
151 |
+
----
|
152 |
+
|
153 |
## Technical Details
|
154 |
|
155 |
In the following some technical details how this model must be used:
|
|
|
163 |
|
164 |
----
|
165 |
|
|
|
166 |
## Background
|
167 |
|
168 |
The project aims to train sentence embedding models on very large sentence level datasets using a self-supervised
|
169 |
contrastive learning objective. We use a contrastive learning objective: given a sentence from the pair, the model should predict which out of a set of randomly sampled other sentences, was actually paired with it in our dataset.
|
170 |
|
171 |
+
We developed this model during the
|
172 |
[Community week using JAX/Flax for NLP & CV](https://discuss.huggingface.co/t/open-to-the-community-community-week-using-jax-flax-for-nlp-cv/7104),
|
173 |
+
organized by Hugging Face. We developed this model as part of the project:
|
174 |
+
[Train the Best Sentence Embedding Model Ever with 1B Training Pairs](https://discuss.huggingface.co/t/train-the-best-sentence-embedding-model-ever-with-1b-training-pairs/7354). We benefited from efficient hardware infrastructure to run the project: 7 TPUs v3-8, as well as intervention from Google's Flax, JAX, and Cloud team members about efficient deep learning frameworks.
|
175 |
|
176 |
## Intended uses
|
177 |
|
178 |
+
Our model is intended to be used for semantic search: It encodes queries / questions and text paragraphs in a dense vector space. It finds relevant documents for the given passages.
|
179 |
|
180 |
Note that there is a limit of 512 word pieces: Text longer than that will be truncated. Further note that the model was just trained on input text up to 250 word pieces. It might not work well for longer text.
|
181 |
|
|
|
|
|
182 |
## Training procedure
|
183 |
|
184 |
The full training script is accessible in this current repository: `train_script.py`.
|
|
|
194 |
|
195 |
The model was trained with [MultipleNegativesRankingLoss](https://www.sbert.net/docs/package_reference/losses.html#multiplenegativesrankingloss) using CLS-pooling, dot-product as similarity function, and a scale of 1.
|
196 |
|
|
|
|
|
|
|
197 |
| Dataset | Number of training tuples |
|
198 |
|--------------------------------------------------------|:--------------------------:|
|
199 |
| [WikiAnswers](https://github.com/afader/oqa#wikianswers-corpus) Duplicate question pairs from WikiAnswers | 77,427,422 |
|