Whisper Speaker Identification (WSI)

Whisper Speaker Identification (WSI) is a state-of-the-art speaker identification model designed for multilingual scenarios.The WSI model adapts OpenAI's Whisper encoder and fine-tunes it with a projection head using triplet loss-based metric learning. This approach enhances its ability to generate discriminative, language-agnostic speaker embeddings.WSI demonstrates state-of-the-art performance on multilingual datasets, achieving lower Equal Error Rates (EER) and higher F1 Scores compared to models such as pyannote/wespeaker-voxceleb-resnet34-LM and speechbrain/spkrec-ecapa-voxceleb.

Cite This Work

Comming Soon!

License

This project is licensed under the CC BY-NC-SA 4.0 License.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no pipeline_tag.

Model tree for emon-j/WSI

Finetuned
(1338)
this model

Dataset used to train emon-j/WSI