File size: 2,744 Bytes
7e848cd
 
 
 
 
 
 
 
a9dd5c0
 
 
 
7e848cd
 
 
 
 
 
a9dd5c0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
fb9c8a8
a9dd5c0
 
 
7e848cd
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
---
base_model: unsloth/whisper-small
tags:
- text-generation-inference
- transformers
- unsloth
- whisper
- trl
- audio
- audio-classification
- speech-processing
- noise-detection
license: apache-2.0
language:
- en
---


# Speech Quality and Environmental Noise Classifier

This is a binary audio classification model that determines if a speech recording is **clean** or if it is degraded by **environmental noise**.

It is specifically trained to be robust and understand the difference between clean audio and audio that has actual background noise (like cars, music, or other people talking).

- **LABEL_0: `clean`**: The audio contains speech with no significant environmental noise. This includes high-quality recordings as well as recordings with source artifacts like hiss, clipping, or "bad microphone" quality.
- **LABEL_1: `noisy`**: The audio contains speech that is obscured by external, environmental background noise.

## Intended Uses & Limitations

This model is ideal for:
- Pre-processing a large audio dataset to filter for clean samples.
- Automatically tagging audio clips for quality control.
- As a gate for ASR (Automatic Speech Recognition) systems that perform better on clean audio.

**Limitations:**
- This model is a **classifier**, not a noise-reduction tool. It only tells you *if* environmental noise is present.
- Its definition of "noisy" is based on environmental sounds. It is trained to classify audio with only source artifacts (like microphone hum or pure static) as `clean`.

## How to Use

The easiest way to use this model is with a `pipeline`.

```bash
pip install transformers torch
```

```python
from transformers import pipeline

classifier = pipeline("audio-classification", model="Etherll/NoisySpeechDetection-v0.2")

# Classify a local audio file (must be a WAV or other supported format)
# The pipeline automatically handles resampling to 16kHz.
results = classifier("path/to/your_audio_file.wav")

# The result is a list of dictionaries
# [{'score': 0.9979726672172546, 'label': 'clean'},
# {'score': 0.002027299487963319, 'label': 'noisy'}]
print(results)
```
> **Note:** The model outputs a confidence score for each label. In my use case, I consider audio to be *clean* if the score for the `clean` label is greater than **0.7**.
## Training Data

This model was trained on a sophisticated, custom-built dataset of ~55,000 audio clips, specifically designed to teach the nuances of audio quality.

This whisper model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.

[<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)