taejinp commited on
Commit
61db6dc
·
verified ·
1 Parent(s): 2d33a24

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +14 -17
README.md CHANGED
@@ -233,44 +233,41 @@ from nemo.collections.asr.models import SortformerEncLabelModel
233
  # load model
234
  diar_model = SortformerEncLabelModel.restore_from(restore_path="/path/to/diar_streaming_sortformer_4spk-v2", map_location=torch.device('cuda'), strict=False)
235
 
 
 
 
236
  # switch to inference mode
237
  diar_model.eval()
238
  ```
239
 
240
  ### Input Format
241
- Input to Sortformer can be either a list of paths to audio files or a jsonl manifest file.
242
-
243
  ```python
244
- pred_outputs = diar_model.diarize(audio=["/path/to/multispeaker_audio1.wav", "/path/to/multispeaker_audio2.wav"], batch_size=1)
245
  ```
246
-
247
- Individual audio file can be fed into Sortformer model as follows:
248
  ```python
249
- pred_output1 = diar_model.diarize(audio="/path/to/multispeaker_audio1.wav", batch_size=1)
250
  ```
251
-
252
-
253
- To use Sortformer for performing diarization on a multi-speaker audio recording, specify the input as jsonl manifest file, where each line in the file is a dictionary containing the following fields:
254
-
 
255
  ```yaml
256
  # Example of a line in `multispeaker_manifest.json`
257
  {
258
  "audio_filepath": "/path/to/multispeaker_audio1.wav", # path to the input audio file
259
- "offset": 0 # offset (start) time of the input audio
260
  "duration": 600, # duration of the audio, can be set to `null` if using NeMo main branch
261
  }
262
  {
263
  "audio_filepath": "/path/to/multispeaker_audio2.wav",
264
- "offset": 0,
265
  "duration": 580,
266
  }
267
  ```
268
 
269
- and then use:
270
- ```python
271
- pred_outputs = diar_model.diarize(audio="/path/to/multispeaker_manifest.json", batch_size=1)
272
- ```
273
-
274
 
275
  ### Input
276
 
 
233
  # load model
234
  diar_model = SortformerEncLabelModel.restore_from(restore_path="/path/to/diar_streaming_sortformer_4spk-v2", map_location=torch.device('cuda'), strict=False)
235
 
236
+ # If you have a downloaded model in "/path/to/diar_streaming_sortformer_4spk-v2.nemo", load model from a downloaded file
237
+ diar_model = SortformerEncLabelModel.restore_from(restore_path="/path/to/diar_streaming_sortformer_4spk-v2.nemo", map_location='cuda', strict=False)
238
+
239
  # switch to inference mode
240
  diar_model.eval()
241
  ```
242
 
243
  ### Input Format
244
+ Input to Sortformer can be an individual audio file:
 
245
  ```python
246
+ audio_input="/path/to/multispeaker_audio1.wav"
247
  ```
248
+ or a list of paths to audio files:
 
249
  ```python
250
+ audio_input=["/path/to/multispeaker_audio1.wav", "/path/to/multispeaker_audio2.wav"]
251
  ```
252
+ or a jsonl manifest file:
253
+ ```python
254
+ audio_input="/path/to/multispeaker_manifest.json"
255
+ ```
256
+ where each line is a dictionary containing the following fields:
257
  ```yaml
258
  # Example of a line in `multispeaker_manifest.json`
259
  {
260
  "audio_filepath": "/path/to/multispeaker_audio1.wav", # path to the input audio file
261
+ "offset": 0, # offset (start) time of the input audio
262
  "duration": 600, # duration of the audio, can be set to `null` if using NeMo main branch
263
  }
264
  {
265
  "audio_filepath": "/path/to/multispeaker_audio2.wav",
266
+ "offset": 900,
267
  "duration": 580,
268
  }
269
  ```
270
 
 
 
 
 
 
271
 
272
  ### Input
273