Again at the top of the Rag benchmark
As explained here : https://huggingface.co/HIT-TMG/KaLM-embedding-multilingual-mini-instruct-v1.5/discussions/7
We evaluate embeddings in a needle in the haystack challenge, which is pretty much
You got a long text
you divide it into chunks of X characters (here 500)
You got a question--answer pair, hide the answer in of the chunks (so it's the needle), then ask the embedding model, using the question (needle magnet) rank the embeddings.
We expect the chunk containing the needle to be in the top ranked similarity
Using this kind of search we can evaluate the embedding model.
And the V2 version is again at the top of the most used models :
Thank you for your contribution! These findings are quite intriguing. Coincidentally, we are currently considering optimizations for tasks like "needle in the haystack" in the next version.
May I ask again: Have you considered establishing this work as a standard benchmark?
We noticed that the MMTEB test suite includes a task called LEMBPasskeyRetrieval. Does your task possess distinctive features in its domain or experimental setup compared to this?
It's considered but at the office we're really out of "free time", not enough resources (humans or machines)
We would also need to increase the dataset size and review it by hand (without being able to speak the language, erh).
Compared to the dataset you showed me it seems very very very artificial and centered around LLM tasks, not embeddings tasks.
As far as I understand this is not a task for embeddings but for LLMs.
In the dataset you will ask the llm to find data in a text, right ? Not to find which chunk (out of the 800 in the dataset) contains the information.
Even if you use an embedding model, it will :
- show a model capacity to do direct name matching
- show a model capacity to compare short needle magnet (question) to long haystack with needle hidden in it (chunk)
Mine does that for one of the multiple tasks, but it also test crosslingual and subtle text matching.
Understood, we greatly anticipate your efforts!
If there are areas where we can provide assistance or collaborate, feel free to reach out anytime for further discussion.