Ziyi223 commited on
Commit
0faabaa
·
verified ·
1 Parent(s): 0d39201

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -21
README.md CHANGED
@@ -2,7 +2,6 @@
2
  license: apache-2.0
3
  ---
4
  # **TEN VAD**
5
-
6
  ***A Low-Latency, Lightweight and High-Performance Streaming VAD***
7
 
8
 
@@ -18,7 +17,6 @@ license: apache-2.0
18
 
19
  The precision-recall curves comparing the performance of WebRTC VAD (pitch-based), Silero VAD, and TEN VAD are shown below. The evaluation is conducted on the precisely manually annotated TEN-VAD-TestSet. The audio files are from librispeech, gigaspeech, DNS Challenge etc. As demonstrated, TEN VAD achieves the best performance. Additionally, cross-validation experiments conducted on large internal real-world datasets demonstrate the reproducibility of these findings. The **TEN-VAD-TestSet with annotated labels** is released in directory "TEN-VAD-TestSet" of this repository.
20
 
21
- <br>
22
 
23
  <div style="text-align:">
24
  <img src="./images/PR_Curves_TEN-VAD-TestSet.png" width="800">
@@ -30,14 +28,14 @@ Note that the default threshold of 0.5 is used to generate binary speech indicat
30
  cd ./examples
31
  python plot_pr_curves.py
32
  ```
33
- <br>
34
 
35
  ### **2. Agent-Friendly:**
36
  As illustrated in the figure below, TEN VAD rapidly detects speech-to-non-speech transitions, whereas Silero VAD suffers from a delay of several hundred milliseconds, resulting in increased end-to-end latency in human-agent interaction systems. In addition, as demonstrated in the 6.5s-7.0s audio segment, Silero VAD fails to identify short silent durations between adjacent speech segments.
37
  <div style="text-align:">
38
  <img src="./images/Agent-Friendly-image.png" width="800">
39
  </div>
40
- <br>
41
 
42
  ### **3. Lightweight:**
43
  We evaluated the RTF (Real-Time Factor) across five distinct platforms, each equipped with varying CPUs. TEN VAD demonstrates much lower computational complexity and smaller library size than Silero VAD.
@@ -113,24 +111,19 @@ We evaluated the RTF (Real-Time Factor) across five distinct platforms, each equ
113
  padding: 8px;
114
  }
115
  </style>
116
- <br>
117
 
118
  ### **4. Multiple programming languages and platforms:**
119
  TEN VAD provides cross-platform C compatibility across five operating systems (Linux x64, Windows, macOS, Android, iOS), with Python bindings optimized for Linux x64.
120
- <br>
121
- <br>
122
 
123
 
124
  ### **5. Supproted sampling rate and hop size:**
125
  TEN VAD operates on 16kHz audio input with configurable hop sizes (optimized frame configurations: 160/256 samples=10/16ms). Other sampling rates must be resampled to 16kHz.
126
- <br>
127
- <br>
128
 
129
  ## **Installation**
130
  ```
131
  git clone https://huggingface.co/TEN-framework/ten-vad
132
  ```
133
- <br>
134
 
135
  ## **Quick Start**
136
  The project supports five major platforms with dynamic library linking.
@@ -180,7 +173,6 @@ The project supports five major platforms with dynamic library linking.
180
  <td> 1. not simulator <br> 2. not iPad </td>
181
  </tr>
182
  </table>
183
- <br>
184
 
185
 
186
  ### **Python Usage**
@@ -201,7 +193,6 @@ You can install the above mentioned dependencies via requirements.txt:
201
  ```
202
  pip install -r requirements.txt
203
  ```
204
- <br>
205
 
206
  #### **Usage**
207
  Note: For usage in python, you can either use it by **git clone** or **pip**.
@@ -222,7 +213,6 @@ cd ./examples
222
  ```
223
  python test.py s0724-s0730.wav out.txt
224
  ```
225
- <br>
226
 
227
  ##### **By using pip:**
228
 
@@ -237,7 +227,6 @@ pip install -U --force-reinstall -v git+https://github.com/TEN-framework/ten-vad
237
  ```
238
  from ten_vad import TenVad
239
  ```
240
- <br>
241
 
242
  ### **C Usage**
243
  #### **Build Scripts**
@@ -267,7 +256,6 @@ Runtime library path configuration:
267
  - Run demo with sample audio s0724-s0730.wav
268
  - Processed results saved to out.txt
269
 
270
- <br>
271
 
272
  The detailed usage methods of each platform are as follows <br>
273
 
@@ -282,7 +270,6 @@ The detailed usage methods of each platform are as follows <br>
282
  1) cd ./examples
283
  2) ./build-and-deploy-linux.sh
284
  ```
285
- <br>
286
 
287
  #### **2. Windows**
288
  ##### **Requirements**
@@ -298,7 +285,6 @@ The detailed usage methods of each platform are as follows <br>
298
  - Visual Studio version (default: 2019)
299
  3) ./build-and-deploy-windows.bat
300
  ```
301
- <br>
302
 
303
  #### **3. macOS**
304
  ##### **Requirements**
@@ -313,7 +299,6 @@ The detailed usage methods of each platform are as follows <br>
313
  - Alternative: x86_64 (Intel)
314
  3) ./build-and-deploy-mac.sh
315
  ```
316
- <br>
317
 
318
  #### **4. Android**
319
  ##### **Requirements**
@@ -330,7 +315,6 @@ The detailed usage methods of each platform are as follows <br>
330
  - Toolchain: aarch64-linux-android-clang (default) or custom NDK toolchain
331
  4) ./build-and-deploy-android.sh
332
  ```
333
- <br>
334
 
335
  #### **5. iOS**
336
  ##### **Requirements**
@@ -381,7 +365,6 @@ cd ./examples
381
  - Specify your Certification
382
 
383
  3.5. Build in Xcode and run demo on your device.
384
- <br>
385
 
386
  ## **Citations**
387
  ```
@@ -396,7 +379,6 @@ cd ./examples
396
  email = {[email protected]}
397
  }
398
  ```
399
- <br>
400
 
401
  ## **License**
402
  This project is Apache 2.0 licensed.
 
2
  license: apache-2.0
3
  ---
4
  # **TEN VAD**
 
5
  ***A Low-Latency, Lightweight and High-Performance Streaming VAD***
6
 
7
 
 
17
 
18
  The precision-recall curves comparing the performance of WebRTC VAD (pitch-based), Silero VAD, and TEN VAD are shown below. The evaluation is conducted on the precisely manually annotated TEN-VAD-TestSet. The audio files are from librispeech, gigaspeech, DNS Challenge etc. As demonstrated, TEN VAD achieves the best performance. Additionally, cross-validation experiments conducted on large internal real-world datasets demonstrate the reproducibility of these findings. The **TEN-VAD-TestSet with annotated labels** is released in directory "TEN-VAD-TestSet" of this repository.
19
 
 
20
 
21
  <div style="text-align:">
22
  <img src="./images/PR_Curves_TEN-VAD-TestSet.png" width="800">
 
28
  cd ./examples
29
  python plot_pr_curves.py
30
  ```
31
+
32
 
33
  ### **2. Agent-Friendly:**
34
  As illustrated in the figure below, TEN VAD rapidly detects speech-to-non-speech transitions, whereas Silero VAD suffers from a delay of several hundred milliseconds, resulting in increased end-to-end latency in human-agent interaction systems. In addition, as demonstrated in the 6.5s-7.0s audio segment, Silero VAD fails to identify short silent durations between adjacent speech segments.
35
  <div style="text-align:">
36
  <img src="./images/Agent-Friendly-image.png" width="800">
37
  </div>
38
+
39
 
40
  ### **3. Lightweight:**
41
  We evaluated the RTF (Real-Time Factor) across five distinct platforms, each equipped with varying CPUs. TEN VAD demonstrates much lower computational complexity and smaller library size than Silero VAD.
 
111
  padding: 8px;
112
  }
113
  </style>
 
114
 
115
  ### **4. Multiple programming languages and platforms:**
116
  TEN VAD provides cross-platform C compatibility across five operating systems (Linux x64, Windows, macOS, Android, iOS), with Python bindings optimized for Linux x64.
 
 
117
 
118
 
119
  ### **5. Supproted sampling rate and hop size:**
120
  TEN VAD operates on 16kHz audio input with configurable hop sizes (optimized frame configurations: 160/256 samples=10/16ms). Other sampling rates must be resampled to 16kHz.
121
+
 
122
 
123
  ## **Installation**
124
  ```
125
  git clone https://huggingface.co/TEN-framework/ten-vad
126
  ```
 
127
 
128
  ## **Quick Start**
129
  The project supports five major platforms with dynamic library linking.
 
173
  <td> 1. not simulator <br> 2. not iPad </td>
174
  </tr>
175
  </table>
 
176
 
177
 
178
  ### **Python Usage**
 
193
  ```
194
  pip install -r requirements.txt
195
  ```
 
196
 
197
  #### **Usage**
198
  Note: For usage in python, you can either use it by **git clone** or **pip**.
 
213
  ```
214
  python test.py s0724-s0730.wav out.txt
215
  ```
 
216
 
217
  ##### **By using pip:**
218
 
 
227
  ```
228
  from ten_vad import TenVad
229
  ```
 
230
 
231
  ### **C Usage**
232
  #### **Build Scripts**
 
256
  - Run demo with sample audio s0724-s0730.wav
257
  - Processed results saved to out.txt
258
 
 
259
 
260
  The detailed usage methods of each platform are as follows <br>
261
 
 
270
  1) cd ./examples
271
  2) ./build-and-deploy-linux.sh
272
  ```
 
273
 
274
  #### **2. Windows**
275
  ##### **Requirements**
 
285
  - Visual Studio version (default: 2019)
286
  3) ./build-and-deploy-windows.bat
287
  ```
 
288
 
289
  #### **3. macOS**
290
  ##### **Requirements**
 
299
  - Alternative: x86_64 (Intel)
300
  3) ./build-and-deploy-mac.sh
301
  ```
 
302
 
303
  #### **4. Android**
304
  ##### **Requirements**
 
315
  - Toolchain: aarch64-linux-android-clang (default) or custom NDK toolchain
316
  4) ./build-and-deploy-android.sh
317
  ```
 
318
 
319
  #### **5. iOS**
320
  ##### **Requirements**
 
365
  - Specify your Certification
366
 
367
  3.5. Build in Xcode and run demo on your device.
 
368
 
369
  ## **Citations**
370
  ```
 
379
  email = {[email protected]}
380
  }
381
  ```
 
382
 
383
  ## **License**
384
  This project is Apache 2.0 licensed.