Segmentation model

This model was trained on AMI-MixHeadset and my own synthetic dataset of Russian speech.

Training time: 5 hours on GTX 3060

This model can be used for diarization model from pyannote/speaker-diarization

Benchmark DER%
AMI (headset mix, only_words) 38.8

Usage example

import yaml
from yaml.loader import SafeLoader

import torch
from pyannote.audio import Model
from pyannote.audio.pipelines import SpeakerDiarization


segm_model = torch.load('model/segm_model.pth', map_location=torch.device('cpu'))
embed_model = Model.from_pretrained("pyannote/embedding", use_auth_token='ACCESS_TOKEN_GOES_HERE')
diar_pipeline = SpeakerDiarization(
    segmentation=segm_model,
    segmentation_batch_size=16,
    clustering="AgglomerativeClustering",
    embedding=embed_model
)

with open('model/config.yaml', 'r') as f:
    diar_config = yaml.load(f, Loader=SafeLoader)
diar_pipeline.instantiate(diar_config)

annotation = diar_pipeline('audio.wav')
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no pipeline_tag.