Upload README.md with huggingface_hub
Browse files
README.md
ADDED
@@ -0,0 +1,147 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: apache-2.0
|
3 |
+
base_model: microsoft/DialoGPT-medium
|
4 |
+
tags:
|
5 |
+
- intent-classification
|
6 |
+
- text-classification
|
7 |
+
- onnx
|
8 |
+
- transformers.js
|
9 |
+
- nlp
|
10 |
+
language:
|
11 |
+
- en
|
12 |
+
metrics:
|
13 |
+
- accuracy
|
14 |
+
- f1
|
15 |
+
library_name: transformers
|
16 |
+
pipeline_tag: text-classification
|
17 |
+
---
|
18 |
+
|
19 |
+
# Intent Classifier - MiniLM
|
20 |
+
|
21 |
+
A fine-tuned intent classification model based on MiniLM, optimized for fast inference with multiple ONNX quantization variants.
|
22 |
+
|
23 |
+
## Model Description
|
24 |
+
|
25 |
+
This model is designed for intent classification tasks and has been converted to ONNX format for efficient deployment in various environments, including web browsers using Transformers.js.
|
26 |
+
|
27 |
+
## Model Variants
|
28 |
+
|
29 |
+
This repository contains multiple ONNX model variants optimized for different use cases:
|
30 |
+
|
31 |
+
| Model File | Description | Use Case |
|
32 |
+
|------------|-------------|----------|
|
33 |
+
| `model.onnx` | Original ONNX model | Best accuracy, larger size |
|
34 |
+
| `model_fp16.onnx` | 16-bit floating point | Good balance of accuracy and speed |
|
35 |
+
| `model_int8.onnx` | 8-bit integer quantized | Faster inference, smaller size |
|
36 |
+
| `model_q4.onnx` | 4-bit quantized | Very fast, very small |
|
37 |
+
| `model_q4f16.onnx` | 4-bit with FP16 | Optimized for specific hardware |
|
38 |
+
| `model_quantized.onnx` | Standard quantized | General purpose optimization |
|
39 |
+
| `model_uint8.onnx` | Unsigned 8-bit | Mobile/edge deployment |
|
40 |
+
| `model_bnb4.onnx` | BitsAndBytes 4-bit | Advanced quantization |
|
41 |
+
|
42 |
+
## Quick Start
|
43 |
+
|
44 |
+
### Using with Transformers.js (Browser)
|
45 |
+
|
46 |
+
```javascript
|
47 |
+
import { pipeline } from '@xenova/transformers';
|
48 |
+
|
49 |
+
// Load the model
|
50 |
+
const classifier = await pipeline('text-classification', 'kousik-2310/intent-classifier-minilm');
|
51 |
+
|
52 |
+
// Classify text
|
53 |
+
const result = await classifier('I want to book a flight to New York');
|
54 |
+
console.log(result);
|
55 |
+
```
|
56 |
+
|
57 |
+
### Using with Python/Transformers
|
58 |
+
|
59 |
+
```python
|
60 |
+
from transformers import AutoTokenizer, AutoModelForSequenceClassification
|
61 |
+
from transformers import pipeline
|
62 |
+
|
63 |
+
# Load tokenizer and model
|
64 |
+
tokenizer = AutoTokenizer.from_pretrained("kousik-2310/intent-classifier-minilm")
|
65 |
+
model = AutoModelForSequenceClassification.from_pretrained("kousik-2310/intent-classifier-minilm")
|
66 |
+
|
67 |
+
# Create pipeline
|
68 |
+
classifier = pipeline("text-classification", model=model, tokenizer=tokenizer)
|
69 |
+
|
70 |
+
# Classify text
|
71 |
+
result = classifier("I want to book a flight to New York")
|
72 |
+
print(result)
|
73 |
+
```
|
74 |
+
|
75 |
+
### Using ONNX Runtime
|
76 |
+
|
77 |
+
```python
|
78 |
+
import onnxruntime as ort
|
79 |
+
from transformers import AutoTokenizer
|
80 |
+
|
81 |
+
# Load tokenizer
|
82 |
+
tokenizer = AutoTokenizer.from_pretrained("kousik-2310/intent-classifier-minilm")
|
83 |
+
|
84 |
+
# Load ONNX model
|
85 |
+
session = ort.InferenceSession("onnx/model_int8.onnx")
|
86 |
+
|
87 |
+
# Tokenize input
|
88 |
+
text = "I want to book a flight to New York"
|
89 |
+
inputs = tokenizer(text, return_tensors="np", padding=True, truncation=True)
|
90 |
+
|
91 |
+
# Run inference
|
92 |
+
outputs = session.run(None, {
|
93 |
+
"input_ids": inputs["input_ids"],
|
94 |
+
"attention_mask": inputs["attention_mask"]
|
95 |
+
})
|
96 |
+
|
97 |
+
# Process results
|
98 |
+
predictions = outputs[0]
|
99 |
+
```
|
100 |
+
|
101 |
+
## Model Architecture
|
102 |
+
|
103 |
+
- **Base Model**: MiniLM architecture
|
104 |
+
- **Task**: Text Classification (Intent Recognition)
|
105 |
+
- **Framework**: PyTorch → ONNX
|
106 |
+
- **Quantization**: Multiple variants available
|
107 |
+
|
108 |
+
## Performance
|
109 |
+
|
110 |
+
The model provides different performance characteristics based on the variant used:
|
111 |
+
|
112 |
+
- **Accuracy**: Best with `model.onnx`, good with quantized versions
|
113 |
+
- **Speed**: Fastest with `model_q4.onnx` and `model_int8.onnx`
|
114 |
+
- **Size**: Smallest with quantized variants (4-bit, 8-bit)
|
115 |
+
|
116 |
+
## Intended Use
|
117 |
+
|
118 |
+
This model is intended for:
|
119 |
+
- Intent classification in chatbots and virtual assistants
|
120 |
+
- Text classification tasks
|
121 |
+
- Real-time inference in web applications
|
122 |
+
- Edge deployment scenarios
|
123 |
+
|
124 |
+
## Training Details
|
125 |
+
|
126 |
+
The model has been fine-tuned for intent classification and converted to multiple ONNX formats for optimal deployment flexibility.
|
127 |
+
|
128 |
+
## Limitations and Bias
|
129 |
+
|
130 |
+
- The model performance depends on the similarity between your use case and the training data
|
131 |
+
- Quantized models may have slightly reduced accuracy compared to the full precision model
|
132 |
+
- Performance may vary based on the deployment environment
|
133 |
+
|
134 |
+
## How to Cite
|
135 |
+
|
136 |
+
```bibtex
|
137 |
+
@misc{intent-classifier-minilm,
|
138 |
+
title={Intent Classifier MiniLM},
|
139 |
+
author={kousik-2310},
|
140 |
+
year={2024},
|
141 |
+
url={https://huggingface.co/kousik-2310/intent-classifier-minilm}
|
142 |
+
}
|
143 |
+
```
|
144 |
+
|
145 |
+
## License
|
146 |
+
|
147 |
+
This model is released under the Apache 2.0 License.
|