Etherll
/

NoisySpeechDetection-v0.2

Audio Classification

text-generation-inference

speech-processing

noise-detection

Model card Files Files and versions

NoisySpeechDetection-v0.2 / README.md

Etherll's picture

Update README.md

4ea97e0 verified 2 months ago

|

history blame contribute delete

2.74 kB

	---
	base_model: unsloth/whisper-small
	tags:
	- text-generation-inference
	- transformers
	- unsloth
	- whisper
	- trl
	- audio
	- audio-classification
	- speech-processing
	- noise-detection
	license: apache-2.0
	language:
	- en
	---


	# Speech Quality and Environmental Noise Classifier

	This is a binary audio classification model that determines if a speech recording is clean or if it is degraded by environmental noise.

	It is specifically trained to be robust and understand the difference between clean audio and audio that has actual background noise (like cars, music, or other people talking).

	- LABEL_0: `clean`: The audio contains speech with no significant environmental noise. This includes high-quality recordings as well as recordings with source artifacts like hiss, clipping, or "bad microphone" quality.
	- LABEL_1: `noisy`: The audio contains speech that is obscured by external, environmental background noise.

	## Intended Uses & Limitations

	This model is ideal for:
	- Pre-processing a large audio dataset to filter for clean samples.
	- Automatically tagging audio clips for quality control.
	- As a gate for ASR (Automatic Speech Recognition) systems that perform better on clean audio.

	Limitations:
	- This model is a classifier, not a noise-reduction tool. It only tells you if environmental noise is present.
	- Its definition of "noisy" is based on environmental sounds. It is trained to classify audio with only source artifacts (like microphone hum or pure static) as `clean`.

	## How to Use

	The easiest way to use this model is with a `pipeline`.

	```bash
	pip install transformers torch
	```

	```python
	from transformers import pipeline

	classifier = pipeline("audio-classification", model="Etherll/NoisySpeechDetection-v0.2")

	# Classify a local audio file (must be a WAV or other supported format)
	# The pipeline automatically handles resampling to 16kHz.
	results = classifier("path/to/your_audio_file.wav")

	# The result is a list of dictionaries
	# [{'score': 0.9979726672172546, 'label': 'clean'},
	# {'score': 0.002027299487963319, 'label': 'noisy'}]
	print(results)
	```
	> Note: The model outputs a confidence score for each label. In my use case, I consider audio to be clean if the score for the `clean` label is greater than 0.7.
	## Training Data

	This model was trained on a sophisticated, custom-built dataset of ~55,000 audio clips, specifically designed to teach the nuances of audio quality.

	This whisper model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.

	[<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)