Update README.md
Browse files
README.md
CHANGED
@@ -2,7 +2,6 @@
|
|
2 |
license: apache-2.0
|
3 |
---
|
4 |
# **TEN VAD**
|
5 |
-
|
6 |
***A Low-Latency, Lightweight and High-Performance Streaming VAD***
|
7 |
|
8 |
|
@@ -18,7 +17,6 @@ license: apache-2.0
|
|
18 |
|
19 |
The precision-recall curves comparing the performance of WebRTC VAD (pitch-based), Silero VAD, and TEN VAD are shown below. The evaluation is conducted on the precisely manually annotated TEN-VAD-TestSet. The audio files are from librispeech, gigaspeech, DNS Challenge etc. As demonstrated, TEN VAD achieves the best performance. Additionally, cross-validation experiments conducted on large internal real-world datasets demonstrate the reproducibility of these findings. The **TEN-VAD-TestSet with annotated labels** is released in directory "TEN-VAD-TestSet" of this repository.
|
20 |
|
21 |
-
<br>
|
22 |
|
23 |
<div style="text-align:">
|
24 |
<img src="./images/PR_Curves_TEN-VAD-TestSet.png" width="800">
|
@@ -30,14 +28,14 @@ Note that the default threshold of 0.5 is used to generate binary speech indicat
|
|
30 |
cd ./examples
|
31 |
python plot_pr_curves.py
|
32 |
```
|
33 |
-
|
34 |
|
35 |
### **2. Agent-Friendly:**
|
36 |
As illustrated in the figure below, TEN VAD rapidly detects speech-to-non-speech transitions, whereas Silero VAD suffers from a delay of several hundred milliseconds, resulting in increased end-to-end latency in human-agent interaction systems. In addition, as demonstrated in the 6.5s-7.0s audio segment, Silero VAD fails to identify short silent durations between adjacent speech segments.
|
37 |
<div style="text-align:">
|
38 |
<img src="./images/Agent-Friendly-image.png" width="800">
|
39 |
</div>
|
40 |
-
|
41 |
|
42 |
### **3. Lightweight:**
|
43 |
We evaluated the RTF (Real-Time Factor) across five distinct platforms, each equipped with varying CPUs. TEN VAD demonstrates much lower computational complexity and smaller library size than Silero VAD.
|
@@ -113,24 +111,19 @@ We evaluated the RTF (Real-Time Factor) across five distinct platforms, each equ
|
|
113 |
padding: 8px;
|
114 |
}
|
115 |
</style>
|
116 |
-
<br>
|
117 |
|
118 |
### **4. Multiple programming languages and platforms:**
|
119 |
TEN VAD provides cross-platform C compatibility across five operating systems (Linux x64, Windows, macOS, Android, iOS), with Python bindings optimized for Linux x64.
|
120 |
-
<br>
|
121 |
-
<br>
|
122 |
|
123 |
|
124 |
### **5. Supproted sampling rate and hop size:**
|
125 |
TEN VAD operates on 16kHz audio input with configurable hop sizes (optimized frame configurations: 160/256 samples=10/16ms). Other sampling rates must be resampled to 16kHz.
|
126 |
-
|
127 |
-
<br>
|
128 |
|
129 |
## **Installation**
|
130 |
```
|
131 |
git clone https://huggingface.co/TEN-framework/ten-vad
|
132 |
```
|
133 |
-
<br>
|
134 |
|
135 |
## **Quick Start**
|
136 |
The project supports five major platforms with dynamic library linking.
|
@@ -180,7 +173,6 @@ The project supports five major platforms with dynamic library linking.
|
|
180 |
<td> 1. not simulator <br> 2. not iPad </td>
|
181 |
</tr>
|
182 |
</table>
|
183 |
-
<br>
|
184 |
|
185 |
|
186 |
### **Python Usage**
|
@@ -201,7 +193,6 @@ You can install the above mentioned dependencies via requirements.txt:
|
|
201 |
```
|
202 |
pip install -r requirements.txt
|
203 |
```
|
204 |
-
<br>
|
205 |
|
206 |
#### **Usage**
|
207 |
Note: For usage in python, you can either use it by **git clone** or **pip**.
|
@@ -222,7 +213,6 @@ cd ./examples
|
|
222 |
```
|
223 |
python test.py s0724-s0730.wav out.txt
|
224 |
```
|
225 |
-
<br>
|
226 |
|
227 |
##### **By using pip:**
|
228 |
|
@@ -237,7 +227,6 @@ pip install -U --force-reinstall -v git+https://github.com/TEN-framework/ten-vad
|
|
237 |
```
|
238 |
from ten_vad import TenVad
|
239 |
```
|
240 |
-
<br>
|
241 |
|
242 |
### **C Usage**
|
243 |
#### **Build Scripts**
|
@@ -267,7 +256,6 @@ Runtime library path configuration:
|
|
267 |
- Run demo with sample audio s0724-s0730.wav
|
268 |
- Processed results saved to out.txt
|
269 |
|
270 |
-
<br>
|
271 |
|
272 |
The detailed usage methods of each platform are as follows <br>
|
273 |
|
@@ -282,7 +270,6 @@ The detailed usage methods of each platform are as follows <br>
|
|
282 |
1) cd ./examples
|
283 |
2) ./build-and-deploy-linux.sh
|
284 |
```
|
285 |
-
<br>
|
286 |
|
287 |
#### **2. Windows**
|
288 |
##### **Requirements**
|
@@ -298,7 +285,6 @@ The detailed usage methods of each platform are as follows <br>
|
|
298 |
- Visual Studio version (default: 2019)
|
299 |
3) ./build-and-deploy-windows.bat
|
300 |
```
|
301 |
-
<br>
|
302 |
|
303 |
#### **3. macOS**
|
304 |
##### **Requirements**
|
@@ -313,7 +299,6 @@ The detailed usage methods of each platform are as follows <br>
|
|
313 |
- Alternative: x86_64 (Intel)
|
314 |
3) ./build-and-deploy-mac.sh
|
315 |
```
|
316 |
-
<br>
|
317 |
|
318 |
#### **4. Android**
|
319 |
##### **Requirements**
|
@@ -330,7 +315,6 @@ The detailed usage methods of each platform are as follows <br>
|
|
330 |
- Toolchain: aarch64-linux-android-clang (default) or custom NDK toolchain
|
331 |
4) ./build-and-deploy-android.sh
|
332 |
```
|
333 |
-
<br>
|
334 |
|
335 |
#### **5. iOS**
|
336 |
##### **Requirements**
|
@@ -381,7 +365,6 @@ cd ./examples
|
|
381 |
- Specify your Certification
|
382 |
|
383 |
3.5. Build in Xcode and run demo on your device.
|
384 |
-
<br>
|
385 |
|
386 |
## **Citations**
|
387 |
```
|
@@ -396,7 +379,6 @@ cd ./examples
|
|
396 |
email = {[email protected]}
|
397 |
}
|
398 |
```
|
399 |
-
<br>
|
400 |
|
401 |
## **License**
|
402 |
This project is Apache 2.0 licensed.
|
|
|
2 |
license: apache-2.0
|
3 |
---
|
4 |
# **TEN VAD**
|
|
|
5 |
***A Low-Latency, Lightweight and High-Performance Streaming VAD***
|
6 |
|
7 |
|
|
|
17 |
|
18 |
The precision-recall curves comparing the performance of WebRTC VAD (pitch-based), Silero VAD, and TEN VAD are shown below. The evaluation is conducted on the precisely manually annotated TEN-VAD-TestSet. The audio files are from librispeech, gigaspeech, DNS Challenge etc. As demonstrated, TEN VAD achieves the best performance. Additionally, cross-validation experiments conducted on large internal real-world datasets demonstrate the reproducibility of these findings. The **TEN-VAD-TestSet with annotated labels** is released in directory "TEN-VAD-TestSet" of this repository.
|
19 |
|
|
|
20 |
|
21 |
<div style="text-align:">
|
22 |
<img src="./images/PR_Curves_TEN-VAD-TestSet.png" width="800">
|
|
|
28 |
cd ./examples
|
29 |
python plot_pr_curves.py
|
30 |
```
|
31 |
+
|
32 |
|
33 |
### **2. Agent-Friendly:**
|
34 |
As illustrated in the figure below, TEN VAD rapidly detects speech-to-non-speech transitions, whereas Silero VAD suffers from a delay of several hundred milliseconds, resulting in increased end-to-end latency in human-agent interaction systems. In addition, as demonstrated in the 6.5s-7.0s audio segment, Silero VAD fails to identify short silent durations between adjacent speech segments.
|
35 |
<div style="text-align:">
|
36 |
<img src="./images/Agent-Friendly-image.png" width="800">
|
37 |
</div>
|
38 |
+
|
39 |
|
40 |
### **3. Lightweight:**
|
41 |
We evaluated the RTF (Real-Time Factor) across five distinct platforms, each equipped with varying CPUs. TEN VAD demonstrates much lower computational complexity and smaller library size than Silero VAD.
|
|
|
111 |
padding: 8px;
|
112 |
}
|
113 |
</style>
|
|
|
114 |
|
115 |
### **4. Multiple programming languages and platforms:**
|
116 |
TEN VAD provides cross-platform C compatibility across five operating systems (Linux x64, Windows, macOS, Android, iOS), with Python bindings optimized for Linux x64.
|
|
|
|
|
117 |
|
118 |
|
119 |
### **5. Supproted sampling rate and hop size:**
|
120 |
TEN VAD operates on 16kHz audio input with configurable hop sizes (optimized frame configurations: 160/256 samples=10/16ms). Other sampling rates must be resampled to 16kHz.
|
121 |
+
|
|
|
122 |
|
123 |
## **Installation**
|
124 |
```
|
125 |
git clone https://huggingface.co/TEN-framework/ten-vad
|
126 |
```
|
|
|
127 |
|
128 |
## **Quick Start**
|
129 |
The project supports five major platforms with dynamic library linking.
|
|
|
173 |
<td> 1. not simulator <br> 2. not iPad </td>
|
174 |
</tr>
|
175 |
</table>
|
|
|
176 |
|
177 |
|
178 |
### **Python Usage**
|
|
|
193 |
```
|
194 |
pip install -r requirements.txt
|
195 |
```
|
|
|
196 |
|
197 |
#### **Usage**
|
198 |
Note: For usage in python, you can either use it by **git clone** or **pip**.
|
|
|
213 |
```
|
214 |
python test.py s0724-s0730.wav out.txt
|
215 |
```
|
|
|
216 |
|
217 |
##### **By using pip:**
|
218 |
|
|
|
227 |
```
|
228 |
from ten_vad import TenVad
|
229 |
```
|
|
|
230 |
|
231 |
### **C Usage**
|
232 |
#### **Build Scripts**
|
|
|
256 |
- Run demo with sample audio s0724-s0730.wav
|
257 |
- Processed results saved to out.txt
|
258 |
|
|
|
259 |
|
260 |
The detailed usage methods of each platform are as follows <br>
|
261 |
|
|
|
270 |
1) cd ./examples
|
271 |
2) ./build-and-deploy-linux.sh
|
272 |
```
|
|
|
273 |
|
274 |
#### **2. Windows**
|
275 |
##### **Requirements**
|
|
|
285 |
- Visual Studio version (default: 2019)
|
286 |
3) ./build-and-deploy-windows.bat
|
287 |
```
|
|
|
288 |
|
289 |
#### **3. macOS**
|
290 |
##### **Requirements**
|
|
|
299 |
- Alternative: x86_64 (Intel)
|
300 |
3) ./build-and-deploy-mac.sh
|
301 |
```
|
|
|
302 |
|
303 |
#### **4. Android**
|
304 |
##### **Requirements**
|
|
|
315 |
- Toolchain: aarch64-linux-android-clang (default) or custom NDK toolchain
|
316 |
4) ./build-and-deploy-android.sh
|
317 |
```
|
|
|
318 |
|
319 |
#### **5. iOS**
|
320 |
##### **Requirements**
|
|
|
365 |
- Specify your Certification
|
366 |
|
367 |
3.5. Build in Xcode and run demo on your device.
|
|
|
368 |
|
369 |
## **Citations**
|
370 |
```
|
|
|
379 |
email = {[email protected]}
|
380 |
}
|
381 |
```
|
|
|
382 |
|
383 |
## **License**
|
384 |
This project is Apache 2.0 licensed.
|