File size: 2,744 Bytes
7e848cd a9dd5c0 7e848cd a9dd5c0 fb9c8a8 a9dd5c0 7e848cd |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 |
---
base_model: unsloth/whisper-small
tags:
- text-generation-inference
- transformers
- unsloth
- whisper
- trl
- audio
- audio-classification
- speech-processing
- noise-detection
license: apache-2.0
language:
- en
---
# Speech Quality and Environmental Noise Classifier
This is a binary audio classification model that determines if a speech recording is **clean** or if it is degraded by **environmental noise**.
It is specifically trained to be robust and understand the difference between clean audio and audio that has actual background noise (like cars, music, or other people talking).
- **LABEL_0: `clean`**: The audio contains speech with no significant environmental noise. This includes high-quality recordings as well as recordings with source artifacts like hiss, clipping, or "bad microphone" quality.
- **LABEL_1: `noisy`**: The audio contains speech that is obscured by external, environmental background noise.
## Intended Uses & Limitations
This model is ideal for:
- Pre-processing a large audio dataset to filter for clean samples.
- Automatically tagging audio clips for quality control.
- As a gate for ASR (Automatic Speech Recognition) systems that perform better on clean audio.
**Limitations:**
- This model is a **classifier**, not a noise-reduction tool. It only tells you *if* environmental noise is present.
- Its definition of "noisy" is based on environmental sounds. It is trained to classify audio with only source artifacts (like microphone hum or pure static) as `clean`.
## How to Use
The easiest way to use this model is with a `pipeline`.
```bash
pip install transformers torch
```
```python
from transformers import pipeline
classifier = pipeline("audio-classification", model="Etherll/NoisySpeechDetection-v0.2")
# Classify a local audio file (must be a WAV or other supported format)
# The pipeline automatically handles resampling to 16kHz.
results = classifier("path/to/your_audio_file.wav")
# The result is a list of dictionaries
# [{'score': 0.9979726672172546, 'label': 'clean'},
# {'score': 0.002027299487963319, 'label': 'noisy'}]
print(results)
```
> **Note:** The model outputs a confidence score for each label. In my use case, I consider audio to be *clean* if the score for the `clean` label is greater than **0.7**.
## Training Data
This model was trained on a sophisticated, custom-built dataset of ~55,000 audio clips, specifically designed to teach the nuances of audio quality.
This whisper model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
[<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)
|