whitphx HF Staff commited on
Commit
5cfd31a
·
verified ·
1 Parent(s): 6bd23ed

Add/update the quantized ONNX model files and README.md for Transformers.js v3

Browse files

## Applied Quantizations

### ✅ Based on `decoder_model.onnx` *with* slimming

↳ ✅ `fp16`: `decoder_model_fp16.onnx` (added)
↳ ✅ `int8`: `decoder_model_int8.onnx` (added)
↳ ✅ `uint8`: `decoder_model_uint8.onnx` (added)
↳ ✅ `q4`: `decoder_model_q4.onnx` (added)
↳ ✅ `q4f16`: `decoder_model_q4f16.onnx` (added)
↳ ✅ `bnb4`: `decoder_model_bnb4.onnx` (added)

### ✅ Based on `encoder_model.onnx` *with* slimming

↳ ❌ `int8`: `encoder_model_int8.onnx` (added but JS-based E2E test failed)
```
dtype not specified for "decoder_model_merged". Using the default dtype (fp32) for this device (cpu).
/home/ubuntu/src/tjsmigration/node_modules/.pnpm/[email protected]/node_modules/onnxruntime-node/dist/backend.js:25
__classPrivateFieldGet(this, _OnnxruntimeSessionHandler_inferenceSession, "f").loadModel(pathOrBuffer, options);
^

Error: Could not find an implementation for ConvInteger(10) node with name '/conv1/Conv_quant'
at new OnnxruntimeSessionHandler (/home/ubuntu/src/tjsmigration/node_modules/.pnpm/[email protected]/node_modules/onnxruntime-node/dist/backend.js:25:92)
at Immediate.<anonymous> (/home/ubuntu/src/tjsmigration/node_modules/.pnpm/[email protected]/node_modules/onnxruntime-node/dist/backend.js:67:29)
at process.processImmediate (node:internal/timers:485:21)

Node.js v22.16.0
```
↳ ✅ `uint8`: `encoder_model_uint8.onnx` (added)
↳ ✅ `q4`: `encoder_model_q4.onnx` (added)
↳ ✅ `q4f16`: `encoder_model_q4f16.onnx` (added)
↳ ✅ `bnb4`: `encoder_model_bnb4.onnx` (added)

### ✅ Based on `decoder_with_past_model.onnx` *with* slimming

↳ ✅ `fp16`: `decoder_with_past_model_fp16.onnx` (added)
↳ ✅ `int8`: `decoder_with_past_model_int8.onnx` (added)
↳ ✅ `uint8`: `decoder_with_past_model_uint8.onnx` (added)
↳ ✅ `q4`: `decoder_with_past_model_q4.onnx` (added)
↳ ✅ `q4f16`: `decoder_with_past_model_q4f16.onnx` (added)
↳ ✅ `bnb4`: `decoder_with_past_model_bnb4.onnx` (added)

### ✅ Based on `decoder_model_merged.onnx` *without* slimming

↳ ✅ `fp16`: `decoder_model_merged_fp16.onnx` (replaced because it was invalid)
↳ ✅ `int8`: `decoder_model_merged_int8.onnx` (added)
↳ ✅ `uint8`: `decoder_model_merged_uint8.onnx` (added)
↳ ✅ `q4`: `decoder_model_merged_q4.onnx` (added)
↳ ✅ `q4f16`: `decoder_model_merged_q4f16.onnx` (added)
↳ ✅ `bnb4`: `decoder_model_merged_bnb4.onnx` (added)

README.md CHANGED
@@ -5,4 +5,22 @@ library_name: transformers.js
5
 
6
  https://huggingface.co/openai/whisper-small with ONNX weights to be compatible with Transformers.js.
7
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8
  Note: Having a separate repo for ONNX weights is intended to be a temporary solution until WebML gains more traction. If you would like to make your models web-ready, we recommend converting to ONNX using [🤗 Optimum](https://huggingface.co/docs/optimum/index) and structuring your repo like this one (with ONNX weights located in a subfolder named `onnx`).
 
5
 
6
  https://huggingface.co/openai/whisper-small with ONNX weights to be compatible with Transformers.js.
7
 
8
+ If you haven't already, you can install the [Transformers.js](https://huggingface.co/docs/transformers.js) JavaScript library from [NPM](https://www.npmjs.com/package/@huggingface/transformers) using:
9
+ ```bash
10
+ npm i @huggingface/transformers
11
+ ```
12
+
13
+ ```js
14
+ import { pipeline } from '@huggingface/transformers';
15
+
16
+ // Create the pipeline
17
+ const pipe = await pipeline('automatic-speech-recognition', 'Xenova/whisper-small', {
18
+ dtype: 'fp32', // Options: "fp32", "fp16", "q8", "q4"
19
+ });
20
+
21
+ // Use the model
22
+ const result = await pipe('input text or data');
23
+ console.log(result);
24
+ ```
25
+
26
  Note: Having a separate repo for ONNX weights is intended to be a temporary solution until WebML gains more traction. If you would like to make your models web-ready, we recommend converting to ONNX using [🤗 Optimum](https://huggingface.co/docs/optimum/index) and structuring your repo like this one (with ONNX weights located in a subfolder named `onnx`).
onnx/decoder_model_bnb4.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4d45e70921250a40c9127048713238ce3ab4a509b4be69af9e91a0c72ba67fc8
3
+ size 225577711
onnx/decoder_model_fp16.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d45a8231faf5ebe6aa6e644f10dc7b0e697ce46cab8029c75f44035c9982ed34
3
+ size 307683476
onnx/decoder_model_int8.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a27ccf4d17481253ee17e03b90017369069e8d44d78b3f3dec732937c729d776
3
+ size 315082910
onnx/decoder_model_merged_bnb4.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3905b34f5b2d8b67f559d99a990a3377df5201112c7721c6627284d1a06ce1e8
3
+ size 226154078
onnx/decoder_model_merged_fp16.onnx CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:357f0d1d875494949db93ce043dfd12400402845017a91c6139f91d3a53818c1
3
- size 308607951
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8d0e347441bdac2a62b346bbcb6fc69548651658028ec7e424ecb76c0e09ab9a
3
+ size 308615077
onnx/decoder_model_merged_int8.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e6a3b4022114c44278ff09b2a92b0c20fe6a3e1b41130b3d1a2db4a7ad7bf42c
3
+ size 156781344
onnx/decoder_model_merged_q4.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7de7841243c9f128780dbd18949bca0c7291866f1435b8dafb577bbf445e973b
3
+ size 233230238
onnx/decoder_model_merged_q4f16.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7fca7f5949bc2868d83fb18423ba84b897fc648344d69bc88c7e23c75b12a830
3
+ size 145836023
onnx/decoder_model_merged_uint8.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b21d21a866e5c084e8c4e2a1a21f4bcff134f18d640b2bbc17c2db814d56e170
3
+ size 156781405
onnx/decoder_model_q4.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3c830ed38b68a7dfd5e403daccd3cc9e2f874766e3822f94ca92b444c7f1d0cf
3
+ size 232654735
onnx/decoder_model_q4f16.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7ecb2b6cd875cf6cdbd542df761fd5afdcb04933dcd4649f33b455fe7f5f54f7
3
+ size 144910022
onnx/decoder_model_uint8.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5cc8eb1f341a7bb1647cb14497fabb2569e6d3b8417cdad7ebbc18e06cc1a97e
3
+ size 315082971
onnx/decoder_with_past_model_bnb4.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4c38e40b459ca4c938800e89731d1ac3a8ffe7cdbd9ef42ee8b05eb802c26c06
3
+ size 217591825
onnx/decoder_with_past_model_fp16.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0e0b8257e7ddb6c6d0133dd940cc029b65eaad84d8bbd9b3ec1f9d15b9be8e97
3
+ size 279378750
onnx/decoder_with_past_model_int8.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c4535466361e9b4e6bd6449b4bb5125308ccf214c0fa60f6398a4b9266f9c95b
3
+ size 300883937
onnx/decoder_with_past_model_q4.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:50a4ad7e1f1ebe3e2b6cbdbd2925b550ac96efc2acc231351fff1e0dda5f308b
3
+ size 223784281
onnx/decoder_with_past_model_q4f16.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b807370cc52eadc7816caeafe7fcb0602e435cd910a4d0fd06c6c620ed53041b
3
+ size 136950624
onnx/decoder_with_past_model_uint8.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5399b4d251ca712570a7138ef88d8c26a3f9c5b793aa36e3bcfe95595be944e9
3
+ size 300883985
onnx/encoder_model_bnb4.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:26360e75db17456911d2d22950ae99d184c638dedf60262693470c354c7300b6
3
+ size 60826927
onnx/encoder_model_q4.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7deb48a1db0bc4fb737fdde0d893c9064c678f4c608f2c5d6ca1af8048d610a1
3
+ size 66134815
onnx/encoder_model_q4f16.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e8b0c741bd5b3d6109ea82392dcb673f7599630e8cf1035fea23847f0f040da0
3
+ size 54388383
onnx/encoder_model_uint8.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:85437f8a625d8f58ff63f0ba01ad7113eaf080062d4ddc0cb7b16e3281b981d6
3
+ size 92188755