|
--- |
|
license: openrail |
|
language: |
|
- nl |
|
tags: |
|
- wav2vec2 |
|
- self-supervised |
|
- pretraining |
|
- speech |
|
- audio |
|
--- |
|
# Wav2Vec2-NL |
|
A Dutch Wav2Vec2-base model, pre-trained on 831 hours of exclusively Dutch speech. |
|
|
|
Pre-training data was extracted from a combination of: |
|
- the [Spoken Dutch Corpus](https://taalmaterialen.ivdnt.org/wp-content/uploads/documentatie/cgn_website/doc_English/topics/index.htm) (537 hours; incl. spontaneous conversations, interviews, read speech and news reports) |
|
- the Dutch component of [Multilingual LibriSpeech](https://www.openslr.org/94/) (211 hours; audiobook segments) |
|
- the Dutch subset of the [CommonVoice 16.1](https://huggingface.co/datasets/mozilla-foundation/common_voice_16_1) corpus (83 hours; read aloud speech) |
|
|
|
More information, incl. the training manifest and configuration is available in the [Wav2Vec2-NL repository on Zenodo](http://doi.org/10.5281/zenodo.15550628). |
|
|
|
Analyses of Dutch phonetic and lexical features encoded in Wav2Vec2-NL hidden states are reported in the paper [What do self-supervised speech models know about Dutch? Analyzing advantages of language-specific pre-training](https://arxiv.org/abs/2506.00981) (Interspeech 2025; see full citation [below](#Citation)). |
|
|
|
Note: This model does not have a tokenizer as it was pretrained on audio alone. In order to use this model for speech recognition, a tokenizer should be created and the model should be fine-tuned on labeled text data. Check out [this blog](https://huggingface.co/blog/fine-tune-wav2vec2-english) for an explanation of fine-tuning Wav2Vec2 models on HuggingFace. |
|
|
|
# Usage |
|
```python |
|
from transformers import Wav2Vec2Model |
|
from transformers import Wav2Vec2FeatureExtractor |
|
|
|
feature_extractor = Wav2Vec2FeatureExtractor.from_pretrained('amsterdamNLP/Wav2Vec2-NL') |
|
model = Wav2Vec2Model.from_pretrained('amsterdamNLP/Wav2Vec2-NL') |
|
``` |
|
|
|
# Citation |
|
The _Wav2Vec2-NL_ model was published as part of: |
|
de Heer Kloots, M., Mohebbi, H., Pouw, C., Shen, G., Zuidema, W., Bentum, M. (2025). What do self-supervised speech models know about Dutch? Analyzing advantages of language-specific pre-training. _Proc. INTERSPEECH 2025_. https://doi.org/10.21437/Interspeech.2025-1526 |
|
|
|
BibTex entry: |
|
```bibtex |
|
@inproceedings{deheerkloots25_interspeech, |
|
title = {What do self-supervised speech models know about Dutch? Analyzing advantages of language-specific pre-training}, |
|
author = {Marianne {de Heer Kloots} and Hosein Mohebbi and Charlotte Pouw and Gaofei Shen and Willem Zuidema and Martijn Bentum}, |
|
year = {2025}, |
|
booktitle = {Interspeech 2025}, |
|
doi = {10.21437/Interspeech.2025-1526}, |
|
} |
|
``` |