whitphx HF Staff commited on Jun 24

Commit

5cfd31a

verified ·

1 Parent(s): 6bd23ed

Add/update the quantized ONNX model files and README.md for Transformers.js v3

## Applied Quantizations

### ✅ Based on `decoder_model.onnx` *with* slimming

↳ ✅ `fp16`: `decoder_model_fp16.onnx` (added)
↳ ✅ `int8`: `decoder_model_int8.onnx` (added)
↳ ✅ `uint8`: `decoder_model_uint8.onnx` (added)
↳ ✅ `q4`: `decoder_model_q4.onnx` (added)
↳ ✅ `q4f16`: `decoder_model_q4f16.onnx` (added)
↳ ✅ `bnb4`: `decoder_model_bnb4.onnx` (added)

### ✅ Based on `encoder_model.onnx` *with* slimming

↳ ❌ `int8`: `encoder_model_int8.onnx` (added but JS-based E2E test failed)
```
dtype not specified for "decoder_model_merged". Using the default dtype (fp32) for this device (cpu).
/home/ubuntu/src/tjsmigration/node_modules/.pnpm/[email protected]/node_modules/onnxruntime-node/dist/backend.js:25
__classPrivateFieldGet(this, _OnnxruntimeSessionHandler_inferenceSession, "f").loadModel(pathOrBuffer, options);
^

Error: Could not find an implementation for ConvInteger(10) node with name '/conv1/Conv_quant'
at new OnnxruntimeSessionHandler (/home/ubuntu/src/tjsmigration/node_modules/.pnpm/[email protected]/node_modules/onnxruntime-node/dist/backend.js:25:92)
at Immediate.<anonymous> (/home/ubuntu/src/tjsmigration/node_modules/.pnpm/[email protected]/node_modules/onnxruntime-node/dist/backend.js:67:29)
at process.processImmediate (node:internal/timers:485:21)

Node.js v22.16.0
```
↳ ✅ `uint8`: `encoder_model_uint8.onnx` (added)
↳ ✅ `q4`: `encoder_model_q4.onnx` (added)
↳ ✅ `q4f16`: `encoder_model_q4f16.onnx` (added)
↳ ✅ `bnb4`: `encoder_model_bnb4.onnx` (added)

### ✅ Based on `decoder_with_past_model.onnx` *with* slimming

↳ ✅ `fp16`: `decoder_with_past_model_fp16.onnx` (added)
↳ ✅ `int8`: `decoder_with_past_model_int8.onnx` (added)
↳ ✅ `uint8`: `decoder_with_past_model_uint8.onnx` (added)
↳ ✅ `q4`: `decoder_with_past_model_q4.onnx` (added)
↳ ✅ `q4f16`: `decoder_with_past_model_q4f16.onnx` (added)
↳ ✅ `bnb4`: `decoder_with_past_model_bnb4.onnx` (added)

### ✅ Based on `decoder_model_merged.onnx` *without* slimming

↳ ✅ `fp16`: `decoder_model_merged_fp16.onnx` (replaced because it was invalid)
↳ ✅ `int8`: `decoder_model_merged_int8.onnx` (added)
↳ ✅ `uint8`: `decoder_model_merged_uint8.onnx` (added)
↳ ✅ `q4`: `decoder_model_merged_q4.onnx` (added)
↳ ✅ `q4f16`: `decoder_model_merged_q4f16.onnx` (added)
↳ ✅ `bnb4`: `decoder_model_merged_bnb4.onnx` (added)

Files changed (23) hide show

README.md +18 -0
onnx/decoder_model_bnb4.onnx +3 -0
onnx/decoder_model_fp16.onnx +3 -0
onnx/decoder_model_int8.onnx +3 -0
onnx/decoder_model_merged_bnb4.onnx +3 -0
onnx/decoder_model_merged_fp16.onnx +2 -2
onnx/decoder_model_merged_int8.onnx +3 -0
onnx/decoder_model_merged_q4.onnx +3 -0
onnx/decoder_model_merged_q4f16.onnx +3 -0
onnx/decoder_model_merged_uint8.onnx +3 -0
onnx/decoder_model_q4.onnx +3 -0
onnx/decoder_model_q4f16.onnx +3 -0
onnx/decoder_model_uint8.onnx +3 -0
onnx/decoder_with_past_model_bnb4.onnx +3 -0
onnx/decoder_with_past_model_fp16.onnx +3 -0
onnx/decoder_with_past_model_int8.onnx +3 -0
onnx/decoder_with_past_model_q4.onnx +3 -0
onnx/decoder_with_past_model_q4f16.onnx +3 -0
onnx/decoder_with_past_model_uint8.onnx +3 -0
onnx/encoder_model_bnb4.onnx +3 -0
onnx/encoder_model_q4.onnx +3 -0
onnx/encoder_model_q4f16.onnx +3 -0
onnx/encoder_model_uint8.onnx +3 -0

README.md CHANGED Viewed

@@ -5,4 +5,22 @@ library_name: transformers.js
 https://huggingface.co/openai/whisper-small with ONNX weights to be compatible with Transformers.js.
 Note: Having a separate repo for ONNX weights is intended to be a temporary solution until WebML gains more traction. If you would like to make your models web-ready, we recommend converting to ONNX using [🤗 Optimum](https://huggingface.co/docs/optimum/index) and structuring your repo like this one (with ONNX weights located in a subfolder named `onnx`).

 https://huggingface.co/openai/whisper-small with ONNX weights to be compatible with Transformers.js.
+If you haven't already, you can install the [Transformers.js](https://huggingface.co/docs/transformers.js) JavaScript library from [NPM](https://www.npmjs.com/package/@huggingface/transformers) using:
+```bash
+npm i @huggingface/transformers
+```
+```js
+import { pipeline } from '@huggingface/transformers';
+// Create the pipeline
+const pipe = await pipeline('automatic-speech-recognition', 'Xenova/whisper-small', {
+    dtype: 'fp32',  // Options: "fp32", "fp16", "q8", "q4"
+});
+// Use the model
+const result = await pipe('input text or data');
+console.log(result);
+```
 Note: Having a separate repo for ONNX weights is intended to be a temporary solution until WebML gains more traction. If you would like to make your models web-ready, we recommend converting to ONNX using [🤗 Optimum](https://huggingface.co/docs/optimum/index) and structuring your repo like this one (with ONNX weights located in a subfolder named `onnx`).

onnx/decoder_model_bnb4.onnx ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:4d45e70921250a40c9127048713238ce3ab4a509b4be69af9e91a0c72ba67fc8
+size 225577711

onnx/decoder_model_fp16.onnx ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:d45a8231faf5ebe6aa6e644f10dc7b0e697ce46cab8029c75f44035c9982ed34
+size 307683476

onnx/decoder_model_int8.onnx ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:a27ccf4d17481253ee17e03b90017369069e8d44d78b3f3dec732937c729d776
+size 315082910

onnx/decoder_model_merged_bnb4.onnx ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:3905b34f5b2d8b67f559d99a990a3377df5201112c7721c6627284d1a06ce1e8
+size 226154078

onnx/decoder_model_merged_fp16.onnx CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:357f0d1d875494949db93ce043dfd12400402845017a91c6139f91d3a53818c1
-size 308607951

 version https://git-lfs.github.com/spec/v1
+oid sha256:8d0e347441bdac2a62b346bbcb6fc69548651658028ec7e424ecb76c0e09ab9a
+size 308615077

onnx/decoder_model_merged_int8.onnx ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:e6a3b4022114c44278ff09b2a92b0c20fe6a3e1b41130b3d1a2db4a7ad7bf42c
+size 156781344

onnx/decoder_model_merged_q4.onnx ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:7de7841243c9f128780dbd18949bca0c7291866f1435b8dafb577bbf445e973b
+size 233230238

onnx/decoder_model_merged_q4f16.onnx ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:7fca7f5949bc2868d83fb18423ba84b897fc648344d69bc88c7e23c75b12a830
+size 145836023

onnx/decoder_model_merged_uint8.onnx ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:b21d21a866e5c084e8c4e2a1a21f4bcff134f18d640b2bbc17c2db814d56e170
+size 156781405

onnx/decoder_model_q4.onnx ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:3c830ed38b68a7dfd5e403daccd3cc9e2f874766e3822f94ca92b444c7f1d0cf
+size 232654735

onnx/decoder_model_q4f16.onnx ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:7ecb2b6cd875cf6cdbd542df761fd5afdcb04933dcd4649f33b455fe7f5f54f7
+size 144910022

onnx/decoder_model_uint8.onnx ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:5cc8eb1f341a7bb1647cb14497fabb2569e6d3b8417cdad7ebbc18e06cc1a97e
+size 315082971

onnx/decoder_with_past_model_bnb4.onnx ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:4c38e40b459ca4c938800e89731d1ac3a8ffe7cdbd9ef42ee8b05eb802c26c06
+size 217591825

onnx/decoder_with_past_model_fp16.onnx ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:0e0b8257e7ddb6c6d0133dd940cc029b65eaad84d8bbd9b3ec1f9d15b9be8e97
+size 279378750

onnx/decoder_with_past_model_int8.onnx ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:c4535466361e9b4e6bd6449b4bb5125308ccf214c0fa60f6398a4b9266f9c95b
+size 300883937

onnx/decoder_with_past_model_q4.onnx ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:50a4ad7e1f1ebe3e2b6cbdbd2925b550ac96efc2acc231351fff1e0dda5f308b
+size 223784281

onnx/decoder_with_past_model_q4f16.onnx ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:b807370cc52eadc7816caeafe7fcb0602e435cd910a4d0fd06c6c620ed53041b
+size 136950624

onnx/decoder_with_past_model_uint8.onnx ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:5399b4d251ca712570a7138ef88d8c26a3f9c5b793aa36e3bcfe95595be944e9
+size 300883985

onnx/encoder_model_bnb4.onnx ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:26360e75db17456911d2d22950ae99d184c638dedf60262693470c354c7300b6
+size 60826927

onnx/encoder_model_q4.onnx ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:7deb48a1db0bc4fb737fdde0d893c9064c678f4c608f2c5d6ca1af8048d610a1
+size 66134815

onnx/encoder_model_q4f16.onnx ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:e8b0c741bd5b3d6109ea82392dcb673f7599630e8cf1035fea23847f0f040da0
+size 54388383

onnx/encoder_model_uint8.onnx ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:85437f8a625d8f58ff63f0ba01ad7113eaf080062d4ddc0cb7b16e3281b981d6
+size 92188755