Abstract: In this work we address the problem of separating multiple speakers from a single microphone recording. We estimate a real valued time-frequency representation of the speech sources linearly from features derived from an observed mixture. We use sparse and non-negative encodings of the speech mixture in terms of pre-learned speaker dependent dictionaries as features. Comparing with direct separation in the feature space and with linear estimation using the mixture itself as the features, the method leads to better separation in terms of the signal-toerror ratio.
Demonstration: Here are a few audio samples demonstrating the algorithm described in the paper.
Opposite gender mixtures:
Speaker | Female (4) | Female (7) | Female (11) | Female (15) |
|---|---|---|---|---|
Male (1) | Mix = Male + Female | Mix = Male + Female | Mix = Male + Female | |
Male (2) | Mix = Male + Female | Mix = Male + Female | Mix = Male + Female | Mix = Male + Female |
Male (3) | Mix = Male + Female | Mix = Male + Female | Mix = Male + Female | Mix = Male + Female |
Male (5) | Mix = Male + Female | Mix = Male + Female | Mix = Male + Female | Mix = Male + Female |
Male-male mixtures:
Speaker | Male (2) | Male (3) | Male (5) |
|---|---|---|---|
Male (1) | Mix = Male + Male | Mix = Male + Male | Mix = Male + Male |
Male (2) | Mix = Male + Male | Mix = Male + Male | |
Male (3) | Mix = Male + Male |
Female-female mixtures:
Speaker | Female (7) | Female (11) | Female (15) |
|---|---|---|---|
Female (4) | Mix = Female + Female | Mix = Female + Female | Mix = Female + Female |
Female (7) | Mix = Female + Female | Mix = Female + Female | |
Female (11) | Mix = Female + Female |
- Files:
imm4996.pdf
- Cite:
- Mikkel N. Schmidt and Rasmus K. Olsson, Feature Space Reconstruction for Single-Channel Speech Separation, 2007
- BibTeX:
- @techreport{schmidt07a,
title = "Feature Space Reconstruction for Single-Channel Speech Separation",
author = "Mikkel N. Schmidt and Rasmus K. Olsson",
year = "2007"
}