bclavie
/

fio-base-japanese-v0.1

Sentence Similarity

sentence-transformers

feature-extraction

Model card Files Files and versions Community

bclavie commited on Dec 18, 2023

Commit

cd9f999

·

1 Parent(s): fbc2071

Update README.md

Files changed (1) hide show

README.md +13 -6

README.md CHANGED Viewed

@@ -14,7 +14,7 @@ tags:
 日本語版は近日公開予定です（日本語を勉強中なので、間違いはご容赦ください！）
-fio-base-japanese-v0.1 is a proof of concept, and the first release of the Fio family of Japanese embeddings. It is based on [cl-tohoku/bert-base-japanese-v3](https://huggingface.co/cl-tohoku/bert-base-japanese-v3) and trained on limited volumes of data on single GPU.
 For more information, please refer to [my notes on Fio](https://ben.clavie.eu/fio).
@@ -50,7 +50,18 @@ Italic denotes best model for its size when a smaller model outperforms a bigger
 | text-embedding-ada-002                          | 0.790           | 0.789      | 0.7232     | 0.768   |
-## Usage (Sentence-Transformers)
 This model is best used through [sentence-transformers](https://www.SBERT.net). If you don't have it, it's easy to install:
@@ -70,10 +81,6 @@ print(embeddings)
 ```
-## Usage
-If using for a retrieval task, you must prefix your query with `"関連記事を取得するために使用できるこの文の表現を生成します: "`.
 ### Usage (HuggingFace Transformers)
 Without [sentence-transformers](https://www.SBERT.net), you can use the model like this: First, you pass your input through the transformer model, then you have to apply the right pooling-operation on-top of the contextualized word embeddings.

 日本語版は近日公開予定です（日本語を勉強中なので、間違いはご容赦ください！）
+fio-base-japanese-v0.1 is a proof of concept, and the first release of the Fio family of Japanese embeddings. It is based on [cl-tohoku/bert-base-japanese-v3](https://huggingface.co/cl-tohoku/bert-base-japanese-v3) and trained on limited volumes of data on a single GPU.
 For more information, please refer to [my notes on Fio](https://ben.clavie.eu/fio).
 | text-embedding-ada-002                          | 0.790           | 0.789      | 0.7232     | 0.768   |
+## Usage
+This model requires both `fugashi` and `unidic-lite`:
+```
+pip install -U fugashi unidic-lite
+```
+If using for a retrieval task, you must prefix your query with `"関連記事を取得するために使用できるこの文の表現を生成します: "`.
+### Usage (Sentence-Transformers)
 This model is best used through [sentence-transformers](https://www.SBERT.net). If you don't have it, it's easy to install:
 ```
 ### Usage (HuggingFace Transformers)
 Without [sentence-transformers](https://www.SBERT.net), you can use the model like this: First, you pass your input through the transformer model, then you have to apply the right pooling-operation on-top of the contextualized word embeddings.