whitphx HF Staff commited on
Commit
7b8b5f2
·
verified ·
1 Parent(s): 7154eb9

Add/update the quantized ONNX model files and README.md for Transformers.js v3

Browse files

## Applied Quantizations

### ✅ Based on `decoder_model.onnx` *with* slimming

↳ ✅ `fp16`: `decoder_model_fp16.onnx` (added)
↳ ✅ `int8`: `decoder_model_int8.onnx` (added)
↳ ✅ `uint8`: `decoder_model_uint8.onnx` (added)
↳ ✅ `q4`: `decoder_model_q4.onnx` (added)
↳ ✅ `q4f16`: `decoder_model_q4f16.onnx` (added)
↳ ✅ `bnb4`: `decoder_model_bnb4.onnx` (added)

### ✅ Based on `encoder_model.onnx` *with* slimming

↳ ✅ `fp16`: `encoder_model_fp16.onnx` (added)
↳ ❌ `int8`: `encoder_model_int8.onnx` (added but JS-based E2E test failed)
```
dtype not specified for "decoder_model_merged". Using the default dtype (fp32) for this device (cpu).
/home/ubuntu/src/tjsmigration/node_modules/.pnpm/[email protected]/node_modules/onnxruntime-node/dist/backend.js:25
__classPrivateFieldGet(this, _OnnxruntimeSessionHandler_inferenceSession, "f").loadModel(pathOrBuffer, options);
^

Error: Could not find an implementation for ConvInteger(10) node with name '/conv1/Conv_quant'
at new OnnxruntimeSessionHandler (/home/ubuntu/src/tjsmigration/node_modules/.pnpm/[email protected]/node_modules/onnxruntime-node/dist/backend.js:25:92)
at Immediate.<anonymous> (/home/ubuntu/src/tjsmigration/node_modules/.pnpm/[email protected]/node_modules/onnxruntime-node/dist/backend.js:67:29)
at process.processImmediate (node:internal/timers:485:21)

Node.js v22.16.0
```
↳ ✅ `uint8`: `encoder_model_uint8.onnx` (added)
↳ ✅ `q4`: `encoder_model_q4.onnx` (added)
↳ ✅ `q4f16`: `encoder_model_q4f16.onnx` (added)
↳ ✅ `bnb4`: `encoder_model_bnb4.onnx` (added)

### ✅ Based on `decoder_with_past_model.onnx` *with* slimming

↳ ✅ `fp16`: `decoder_with_past_model_fp16.onnx` (added)
↳ ✅ `int8`: `decoder_with_past_model_int8.onnx` (added)
↳ ✅ `uint8`: `decoder_with_past_model_uint8.onnx` (added)
↳ ✅ `q4`: `decoder_with_past_model_q4.onnx` (added)
↳ ✅ `q4f16`: `decoder_with_past_model_q4f16.onnx` (added)
↳ ✅ `bnb4`: `decoder_with_past_model_bnb4.onnx` (added)

### ✅ Based on `decoder_model_merged.onnx` *without* slimming

README.md CHANGED
@@ -5,4 +5,24 @@ library_name: transformers.js
5
 
6
  https://huggingface.co/openai/whisper-large-v2 with ONNX weights to be compatible with Transformers.js.
7
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8
  Note: Having a separate repo for ONNX weights is intended to be a temporary solution until WebML gains more traction. If you would like to make your models web-ready, we recommend converting to ONNX using [🤗 Optimum](https://huggingface.co/docs/optimum/index) and structuring your repo like this one (with ONNX weights located in a subfolder named `onnx`).
 
5
 
6
  https://huggingface.co/openai/whisper-large-v2 with ONNX weights to be compatible with Transformers.js.
7
 
8
+ If you haven't already, you can install the [Transformers.js](https://huggingface.co/docs/transformers.js) JavaScript library from [NPM](https://www.npmjs.com/package/@huggingface/transformers) using:
9
+ ```bash
10
+ npm i @huggingface/transformers
11
+ ```
12
+
13
+ ## Basic Usage
14
+
15
+ ```js
16
+ import { pipeline } from '@huggingface/transformers';
17
+
18
+ // Create the pipeline
19
+ const pipe = await pipeline('automatic-speech-recognition', 'Xenova/whisper-large-v2', {
20
+ dtype: 'fp32', // Options: "fp32", "fp16", "q8", "q4"
21
+ });
22
+
23
+ // Use the model
24
+ const result = await pipe('input text or data');
25
+ console.log(result);
26
+ ```
27
+
28
  Note: Having a separate repo for ONNX weights is intended to be a temporary solution until WebML gains more traction. If you would like to make your models web-ready, we recommend converting to ONNX using [🤗 Optimum](https://huggingface.co/docs/optimum/index) and structuring your repo like this one (with ONNX weights located in a subfolder named `onnx`).
onnx/decoder_model_bnb4.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3a03f35de702e243cb7c1a6bea7906edfcd8ccd99325594cf47cd8d1472997ef
3
+ size 743928480
onnx/decoder_model_fp16.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1be56c8eef8917136b6fa557a41cbf59ec3e12c29455f79c0bf3801624858f92
3
+ size 1814428308
onnx/decoder_model_int8.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8f1f7bb3c46a840780589583b993c1fe6fdc3a612aa0ed524e3cb20c0321b2e4
3
+ size 1177657683
onnx/decoder_model_q4.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:da745f17fe5511eea71790d597f18bc6bf1d78fc6d362878d47cc3fa2789ae89
3
+ size 796354688
onnx/decoder_model_q4f16.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:fc54fbb6c19b43d25289ab0be5066680e9e56cf857bcb80e0123909fb465f46d
3
+ size 608612981
onnx/decoder_model_uint8.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e83243b326b437aae2df01c663adc7ad094efb74dc585c291d7164d351e0f3d2
3
+ size 1177657834
onnx/decoder_with_past_model_bnb4.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d86ab5778289ad4e07ba1a11f53e60af4f853cf6fe3d98ad1e1f0818a3ea5942
3
+ size 684822244
onnx/decoder_with_past_model_fp16.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:459f4995a6aac59ab9b3c481594eec0ae89a3d59db1963220673ff2a6554afa1
3
+ size 1604702870
onnx/decoder_with_past_model_int8.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e613d4a7ba260c8df5f6263801605de666738b7944c27f4c16b7f50a64b0d84b
3
+ size 1072623046
onnx/decoder_with_past_model_q4.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:12039459a44dbf65d5811bcbdc3be5d3b02b4050b5d6530a6ee4abf483ea6df7
3
+ size 730695364
onnx/decoder_with_past_model_q4f16.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a98459bf4e5821fb10d0360e8e72c6908d637771ddc19b3cace71b9be471cf7e
3
+ size 549610825
onnx/decoder_with_past_model_uint8.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a72afa62aa93399ccdbc0b52c1b68ac8abc30d6b3560bca8e9c419593cd3c4dc
3
+ size 1072623164
onnx/encoder_model_bnb4.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2628acef84f216ed9ffd0f3b85119cc78decc19f973b7ba32d8d6241ed34dc43
3
+ size 384910274
onnx/encoder_model_fp16.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5659f1d4ed3ca10a37beb336a2d741f4c9462a11e01fb674dd5ea763a618e299
3
+ size 1274001163
onnx/encoder_model_q4.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e8c77d10eb577f4470f680579ace1a8b6aff44632101d10f0edac0c4ad44847a
3
+ size 424230306
onnx/encoder_model_q4f16.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7d3073f0693047d64fd357738c2cd0750ed089b7b7219c0ff1c196592ad0a735
3
+ size 369632638
onnx/encoder_model_uint8.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ef54bc1a98d488ff9561c4c95e35c030bf241de8949a8c9b6f07874d96ed2d7f
3
+ size 644662674