Update README.md
Browse files
README.md
CHANGED
@@ -233,44 +233,41 @@ from nemo.collections.asr.models import SortformerEncLabelModel
|
|
233 |
# load model
|
234 |
diar_model = SortformerEncLabelModel.restore_from(restore_path="/path/to/diar_streaming_sortformer_4spk-v2", map_location=torch.device('cuda'), strict=False)
|
235 |
|
|
|
|
|
|
|
236 |
# switch to inference mode
|
237 |
diar_model.eval()
|
238 |
```
|
239 |
|
240 |
### Input Format
|
241 |
-
Input to Sortformer can be
|
242 |
-
|
243 |
```python
|
244 |
-
|
245 |
```
|
246 |
-
|
247 |
-
Individual audio file can be fed into Sortformer model as follows:
|
248 |
```python
|
249 |
-
|
250 |
```
|
251 |
-
|
252 |
-
|
253 |
-
|
254 |
-
|
|
|
255 |
```yaml
|
256 |
# Example of a line in `multispeaker_manifest.json`
|
257 |
{
|
258 |
"audio_filepath": "/path/to/multispeaker_audio1.wav", # path to the input audio file
|
259 |
-
"offset": 0 # offset (start) time of the input audio
|
260 |
"duration": 600, # duration of the audio, can be set to `null` if using NeMo main branch
|
261 |
}
|
262 |
{
|
263 |
"audio_filepath": "/path/to/multispeaker_audio2.wav",
|
264 |
-
"offset":
|
265 |
"duration": 580,
|
266 |
}
|
267 |
```
|
268 |
|
269 |
-
and then use:
|
270 |
-
```python
|
271 |
-
pred_outputs = diar_model.diarize(audio="/path/to/multispeaker_manifest.json", batch_size=1)
|
272 |
-
```
|
273 |
-
|
274 |
|
275 |
### Input
|
276 |
|
|
|
233 |
# load model
|
234 |
diar_model = SortformerEncLabelModel.restore_from(restore_path="/path/to/diar_streaming_sortformer_4spk-v2", map_location=torch.device('cuda'), strict=False)
|
235 |
|
236 |
+
# If you have a downloaded model in "/path/to/diar_streaming_sortformer_4spk-v2.nemo", load model from a downloaded file
|
237 |
+
diar_model = SortformerEncLabelModel.restore_from(restore_path="/path/to/diar_streaming_sortformer_4spk-v2.nemo", map_location='cuda', strict=False)
|
238 |
+
|
239 |
# switch to inference mode
|
240 |
diar_model.eval()
|
241 |
```
|
242 |
|
243 |
### Input Format
|
244 |
+
Input to Sortformer can be an individual audio file:
|
|
|
245 |
```python
|
246 |
+
audio_input="/path/to/multispeaker_audio1.wav"
|
247 |
```
|
248 |
+
or a list of paths to audio files:
|
|
|
249 |
```python
|
250 |
+
audio_input=["/path/to/multispeaker_audio1.wav", "/path/to/multispeaker_audio2.wav"]
|
251 |
```
|
252 |
+
or a jsonl manifest file:
|
253 |
+
```python
|
254 |
+
audio_input="/path/to/multispeaker_manifest.json"
|
255 |
+
```
|
256 |
+
where each line is a dictionary containing the following fields:
|
257 |
```yaml
|
258 |
# Example of a line in `multispeaker_manifest.json`
|
259 |
{
|
260 |
"audio_filepath": "/path/to/multispeaker_audio1.wav", # path to the input audio file
|
261 |
+
"offset": 0, # offset (start) time of the input audio
|
262 |
"duration": 600, # duration of the audio, can be set to `null` if using NeMo main branch
|
263 |
}
|
264 |
{
|
265 |
"audio_filepath": "/path/to/multispeaker_audio2.wav",
|
266 |
+
"offset": 900,
|
267 |
"duration": 580,
|
268 |
}
|
269 |
```
|
270 |
|
|
|
|
|
|
|
|
|
|
|
271 |
|
272 |
### Input
|
273 |
|