taejinp commited on
Commit
3f7996f
·
verified ·
1 Parent(s): cace78d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -4
README.md CHANGED
@@ -114,12 +114,12 @@ img {
114
  This model is a streaming version of Sortformer diarizer. [Sortformer](https://arxiv.org/abs/2409.06656)[1] is a novel end-to-end neural model for speaker diarization, trained with unconventional objectives compared to existing end-to-end diarization models.
115
 
116
  <div align="center">
117
- <img src="sortformer_intro.png" width="750" />
118
  </div>
119
 
120
  Streaming Sortformer approach employs an Arrival-Order Speaker Cache (AOSC) to store frame-level acoustic embeddings of previously observed speakers.
121
  <div align="center">
122
- <img src="streaming_sortformer_ani.gif" width="1400" />
123
  </div>
124
 
125
  Sortformer resolves permutation problem in diarization following the arrival-time order of the speech segments from each speaker.
@@ -129,7 +129,7 @@ Sortformer resolves permutation problem in diarization following the arrival-tim
129
  Streaming sortformer employs pre-encode layer in the Fast-Conformer to generate speaker-cache. At each step, speaker cache is filtered to only retain the high-quality speaker cache vectors.
130
 
131
  <div align="center">
132
- <img src="streaming_steps.png" width="1400" />
133
  </div>
134
 
135
 
@@ -138,7 +138,7 @@ Speech Tasks (NEST)](https://arxiv.org/abs/2408.13106)[2] which is based on [Fas
138
  and two feedforward layers with 4 sigmoid outputs for each frame input at the top layer. More information can be found in the [Sortformer paper](https://arxiv.org/abs/2409.06656)[1].
139
 
140
  <div align="center">
141
- <img src="sortformer-v1-model.png" width="450" />
142
  </div>
143
 
144
 
 
114
  This model is a streaming version of Sortformer diarizer. [Sortformer](https://arxiv.org/abs/2409.06656)[1] is a novel end-to-end neural model for speaker diarization, trained with unconventional objectives compared to existing end-to-end diarization models.
115
 
116
  <div align="center">
117
+ <img src="figures/sortformer_intro.png" width="750" />
118
  </div>
119
 
120
  Streaming Sortformer approach employs an Arrival-Order Speaker Cache (AOSC) to store frame-level acoustic embeddings of previously observed speakers.
121
  <div align="center">
122
+ <img src="figures/streaming_sortformer_ani.gif" width="1400" />
123
  </div>
124
 
125
  Sortformer resolves permutation problem in diarization following the arrival-time order of the speech segments from each speaker.
 
129
  Streaming sortformer employs pre-encode layer in the Fast-Conformer to generate speaker-cache. At each step, speaker cache is filtered to only retain the high-quality speaker cache vectors.
130
 
131
  <div align="center">
132
+ <img src="figures/streaming_steps.png" width="1400" />
133
  </div>
134
 
135
 
 
138
  and two feedforward layers with 4 sigmoid outputs for each frame input at the top layer. More information can be found in the [Sortformer paper](https://arxiv.org/abs/2409.06656)[1].
139
 
140
  <div align="center">
141
+ <img src="figures/sortformer-v1-model.png" width="450" />
142
  </div>
143
 
144