kousik-2310 commited on
Commit
24b5ce5
·
verified ·
1 Parent(s): ed018aa

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +147 -0
README.md ADDED
@@ -0,0 +1,147 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ base_model: microsoft/DialoGPT-medium
4
+ tags:
5
+ - intent-classification
6
+ - text-classification
7
+ - onnx
8
+ - transformers.js
9
+ - nlp
10
+ language:
11
+ - en
12
+ metrics:
13
+ - accuracy
14
+ - f1
15
+ library_name: transformers
16
+ pipeline_tag: text-classification
17
+ ---
18
+
19
+ # Intent Classifier - MiniLM
20
+
21
+ A fine-tuned intent classification model based on MiniLM, optimized for fast inference with multiple ONNX quantization variants.
22
+
23
+ ## Model Description
24
+
25
+ This model is designed for intent classification tasks and has been converted to ONNX format for efficient deployment in various environments, including web browsers using Transformers.js.
26
+
27
+ ## Model Variants
28
+
29
+ This repository contains multiple ONNX model variants optimized for different use cases:
30
+
31
+ | Model File | Description | Use Case |
32
+ |------------|-------------|----------|
33
+ | `model.onnx` | Original ONNX model | Best accuracy, larger size |
34
+ | `model_fp16.onnx` | 16-bit floating point | Good balance of accuracy and speed |
35
+ | `model_int8.onnx` | 8-bit integer quantized | Faster inference, smaller size |
36
+ | `model_q4.onnx` | 4-bit quantized | Very fast, very small |
37
+ | `model_q4f16.onnx` | 4-bit with FP16 | Optimized for specific hardware |
38
+ | `model_quantized.onnx` | Standard quantized | General purpose optimization |
39
+ | `model_uint8.onnx` | Unsigned 8-bit | Mobile/edge deployment |
40
+ | `model_bnb4.onnx` | BitsAndBytes 4-bit | Advanced quantization |
41
+
42
+ ## Quick Start
43
+
44
+ ### Using with Transformers.js (Browser)
45
+
46
+ ```javascript
47
+ import { pipeline } from '@xenova/transformers';
48
+
49
+ // Load the model
50
+ const classifier = await pipeline('text-classification', 'kousik-2310/intent-classifier-minilm');
51
+
52
+ // Classify text
53
+ const result = await classifier('I want to book a flight to New York');
54
+ console.log(result);
55
+ ```
56
+
57
+ ### Using with Python/Transformers
58
+
59
+ ```python
60
+ from transformers import AutoTokenizer, AutoModelForSequenceClassification
61
+ from transformers import pipeline
62
+
63
+ # Load tokenizer and model
64
+ tokenizer = AutoTokenizer.from_pretrained("kousik-2310/intent-classifier-minilm")
65
+ model = AutoModelForSequenceClassification.from_pretrained("kousik-2310/intent-classifier-minilm")
66
+
67
+ # Create pipeline
68
+ classifier = pipeline("text-classification", model=model, tokenizer=tokenizer)
69
+
70
+ # Classify text
71
+ result = classifier("I want to book a flight to New York")
72
+ print(result)
73
+ ```
74
+
75
+ ### Using ONNX Runtime
76
+
77
+ ```python
78
+ import onnxruntime as ort
79
+ from transformers import AutoTokenizer
80
+
81
+ # Load tokenizer
82
+ tokenizer = AutoTokenizer.from_pretrained("kousik-2310/intent-classifier-minilm")
83
+
84
+ # Load ONNX model
85
+ session = ort.InferenceSession("onnx/model_int8.onnx")
86
+
87
+ # Tokenize input
88
+ text = "I want to book a flight to New York"
89
+ inputs = tokenizer(text, return_tensors="np", padding=True, truncation=True)
90
+
91
+ # Run inference
92
+ outputs = session.run(None, {
93
+ "input_ids": inputs["input_ids"],
94
+ "attention_mask": inputs["attention_mask"]
95
+ })
96
+
97
+ # Process results
98
+ predictions = outputs[0]
99
+ ```
100
+
101
+ ## Model Architecture
102
+
103
+ - **Base Model**: MiniLM architecture
104
+ - **Task**: Text Classification (Intent Recognition)
105
+ - **Framework**: PyTorch → ONNX
106
+ - **Quantization**: Multiple variants available
107
+
108
+ ## Performance
109
+
110
+ The model provides different performance characteristics based on the variant used:
111
+
112
+ - **Accuracy**: Best with `model.onnx`, good with quantized versions
113
+ - **Speed**: Fastest with `model_q4.onnx` and `model_int8.onnx`
114
+ - **Size**: Smallest with quantized variants (4-bit, 8-bit)
115
+
116
+ ## Intended Use
117
+
118
+ This model is intended for:
119
+ - Intent classification in chatbots and virtual assistants
120
+ - Text classification tasks
121
+ - Real-time inference in web applications
122
+ - Edge deployment scenarios
123
+
124
+ ## Training Details
125
+
126
+ The model has been fine-tuned for intent classification and converted to multiple ONNX formats for optimal deployment flexibility.
127
+
128
+ ## Limitations and Bias
129
+
130
+ - The model performance depends on the similarity between your use case and the training data
131
+ - Quantized models may have slightly reduced accuracy compared to the full precision model
132
+ - Performance may vary based on the deployment environment
133
+
134
+ ## How to Cite
135
+
136
+ ```bibtex
137
+ @misc{intent-classifier-minilm,
138
+ title={Intent Classifier MiniLM},
139
+ author={kousik-2310},
140
+ year={2024},
141
+ url={https://huggingface.co/kousik-2310/intent-classifier-minilm}
142
+ }
143
+ ```
144
+
145
+ ## License
146
+
147
+ This model is released under the Apache 2.0 License.