nvidia
/

diar_streaming_sortformer_4spk-v2

Audio Classification

speaker-diarization

speaker-recognition

Model card Files Files and versions Community

taejinp commited on Jun 17

Commit

61db6dc

·

verified ·

1 Parent(s): 2d33a24

Update README.md

Files changed (1) hide show

README.md +14 -17

README.md CHANGED Viewed

@@ -233,44 +233,41 @@ from nemo.collections.asr.models import SortformerEncLabelModel
 # load model
 diar_model = SortformerEncLabelModel.restore_from(restore_path="/path/to/diar_streaming_sortformer_4spk-v2", map_location=torch.device('cuda'), strict=False)
 # switch to inference mode
 diar_model.eval()
 ```
 ### Input Format
-Input to Sortformer can be either a list of paths to audio files or a jsonl manifest file.
 ```python
-pred_outputs = diar_model.diarize(audio=["/path/to/multispeaker_audio1.wav", "/path/to/multispeaker_audio2.wav"], batch_size=1)
 ```
-Individual audio file can be fed into Sortformer model as follows:
 ```python
-pred_output1 = diar_model.diarize(audio="/path/to/multispeaker_audio1.wav", batch_size=1)
 ```
-To use Sortformer for performing diarization on a multi-speaker audio recording, specify the input as jsonl manifest file, where each line in the file is a dictionary containing the following fields:
 ```yaml
 # Example of a line in `multispeaker_manifest.json`
 {
     "audio_filepath": "/path/to/multispeaker_audio1.wav",  # path to the input audio file
-    "offset": 0 # offset (start) time of the input audio
     "duration": 600,  # duration of the audio, can be set to `null` if using NeMo main branch
 }
 {
     "audio_filepath": "/path/to/multispeaker_audio2.wav",
-    "offset": 0,
     "duration": 580,
 }
 ```
-and then use:
-```python
-pred_outputs = diar_model.diarize(audio="/path/to/multispeaker_manifest.json", batch_size=1)
-```
 ### Input

 # load model
 diar_model = SortformerEncLabelModel.restore_from(restore_path="/path/to/diar_streaming_sortformer_4spk-v2", map_location=torch.device('cuda'), strict=False)
+# If you have a downloaded model in "/path/to/diar_streaming_sortformer_4spk-v2.nemo", load model from a downloaded file
+diar_model = SortformerEncLabelModel.restore_from(restore_path="/path/to/diar_streaming_sortformer_4spk-v2.nemo", map_location='cuda', strict=False)
 # switch to inference mode
 diar_model.eval()
 ```
 ### Input Format
+Input to Sortformer can be an individual audio file:
 ```python
+audio_input="/path/to/multispeaker_audio1.wav"
 ```
+or a list of paths to audio files:
 ```python
+audio_input=["/path/to/multispeaker_audio1.wav", "/path/to/multispeaker_audio2.wav"]
 ```
+or a jsonl manifest file:
+```python
+audio_input="/path/to/multispeaker_manifest.json"
+```
+where each line is a dictionary containing the following fields:
 ```yaml
 # Example of a line in `multispeaker_manifest.json`
 {
     "audio_filepath": "/path/to/multispeaker_audio1.wav",  # path to the input audio file
+    "offset": 0, # offset (start) time of the input audio
     "duration": 600,  # duration of the audio, can be set to `null` if using NeMo main branch
 }
 {
     "audio_filepath": "/path/to/multispeaker_audio2.wav",
+    "offset": 900,
     "duration": 580,
 }
 ```
 ### Input