Etherll's picture
Update README.md
4ea97e0 verified
---
base_model: unsloth/whisper-small
tags:
- text-generation-inference
- transformers
- unsloth
- whisper
- trl
- audio
- audio-classification
- speech-processing
- noise-detection
license: apache-2.0
language:
- en
---
# Speech Quality and Environmental Noise Classifier
This is a binary audio classification model that determines if a speech recording is **clean** or if it is degraded by **environmental noise**.
It is specifically trained to be robust and understand the difference between clean audio and audio that has actual background noise (like cars, music, or other people talking).
- **LABEL_0: `clean`**: The audio contains speech with no significant environmental noise. This includes high-quality recordings as well as recordings with source artifacts like hiss, clipping, or "bad microphone" quality.
- **LABEL_1: `noisy`**: The audio contains speech that is obscured by external, environmental background noise.
## Intended Uses & Limitations
This model is ideal for:
- Pre-processing a large audio dataset to filter for clean samples.
- Automatically tagging audio clips for quality control.
- As a gate for ASR (Automatic Speech Recognition) systems that perform better on clean audio.
**Limitations:**
- This model is a **classifier**, not a noise-reduction tool. It only tells you *if* environmental noise is present.
- Its definition of "noisy" is based on environmental sounds. It is trained to classify audio with only source artifacts (like microphone hum or pure static) as `clean`.
## How to Use
The easiest way to use this model is with a `pipeline`.
```bash
pip install transformers torch
```
```python
from transformers import pipeline
classifier = pipeline("audio-classification", model="Etherll/NoisySpeechDetection-v0.2")
# Classify a local audio file (must be a WAV or other supported format)
# The pipeline automatically handles resampling to 16kHz.
results = classifier("path/to/your_audio_file.wav")
# The result is a list of dictionaries
# [{'score': 0.9979726672172546, 'label': 'clean'},
# {'score': 0.002027299487963319, 'label': 'noisy'}]
print(results)
```
> **Note:** The model outputs a confidence score for each label. In my use case, I consider audio to be *clean* if the score for the `clean` label is greater than **0.7**.
## Training Data
This model was trained on a sophisticated, custom-built dataset of ~55,000 audio clips, specifically designed to teach the nuances of audio quality.
This whisper model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
[<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)