Abstract: We present a method for suppression of non-stationary noise in single channel recordings of speech. The method is based on nonnegative sparse coding and relies on a voice activity detector. In regions classified as non-speech, we learn an overcomplete basis for the noise which is then used to estimate the speech and the noise from the mixture. We compare the method to the classical approach where the noise spectrum is estimated as the average of non-speech frames. The proposed method significantly outperforms the classic approach when the noise is highly non-stationary.
Demonstration: Here are a few examples comparing the non-stationary noise suppression system with classical spectral subtraction. Examples include four different types of noise and results using the Qualcomm-ICSI-OGI voice activity detector (VAD) and and ideal VAD. These examples are not indicative of what is attainable with a state-of-the-art noise-supression system, since they are computed with a simple supression rule and no post processing.
|Noise||Mixture||Ideal VAD||Qualcomm-ICSI-OGI VAD|
|Proposed Method||Spectral Subtraction||Proposed Method||Spectral Subtraction|
- Mikkel N. Schmidt and Jan Larsen, Reduction of Non-stationary Noise using a Non-negative Latent Variable Decomposition, Machine Learning for Signal Processing, IEEE Workshop on (MLSP), 2008
title = "Reduction of Non-stationary Noise using a Non-negative Latent Variable Decomposition",
author = "Mikkel N. Schmidt and Jan larsen",
booktitle = "Machine Learning for Signal Processing, IEEE Workshop on (MLSP)",
month = "Oct",
pages = "486--491",
year = "2008"