--- base_model: unsloth/whisper-small tags: - text-generation-inference - transformers - unsloth - whisper - trl - audio - audio-classification - speech-processing - noise-detection license: apache-2.0 language: - en --- # Speech Quality and Environmental Noise Classifier This is a binary audio classification model that determines if a speech recording is **clean** or if it is degraded by **environmental noise**. It is specifically trained to be robust and understand the difference between clean audio and audio that has actual background noise (like cars, music, or other people talking). - **LABEL_0: `clean`**: The audio contains speech with no significant environmental noise. This includes high-quality recordings as well as recordings with source artifacts like hiss, clipping, or "bad microphone" quality. - **LABEL_1: `noisy`**: The audio contains speech that is obscured by external, environmental background noise. ## Intended Uses & Limitations This model is ideal for: - Pre-processing a large audio dataset to filter for clean samples. - Automatically tagging audio clips for quality control. - As a gate for ASR (Automatic Speech Recognition) systems that perform better on clean audio. **Limitations:** - This model is a **classifier**, not a noise-reduction tool. It only tells you *if* environmental noise is present. - Its definition of "noisy" is based on environmental sounds. It is trained to classify audio with only source artifacts (like microphone hum or pure static) as `clean`. ## How to Use The easiest way to use this model is with a `pipeline`. ```bash pip install transformers torch ``` ```python from transformers import pipeline classifier = pipeline("audio-classification", model="Etherll/NoisySpeechDetection-v0.2") # Classify a local audio file (must be a WAV or other supported format) # The pipeline automatically handles resampling to 16kHz. results = classifier("path/to/your_audio_file.wav") # The result is a list of dictionaries # [{'score': 0.9979726672172546, 'label': 'clean'}, # {'score': 0.002027299487963319, 'label': 'noisy'}] print(results) ``` > **Note:** The model outputs a confidence score for each label. In my use case, I consider audio to be *clean* if the score for the `clean` label is greater than **0.7**. ## Training Data This model was trained on a sophisticated, custom-built dataset of ~55,000 audio clips, specifically designed to teach the nuances of audio quality. This whisper model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library. [](https://github.com/unslothai/unsloth)