Mikkel N. Schmidt and Rasmus K. Olsson

Abstract: In this work we address the problem of separating multiple speakers from a single microphone recording. We formulate a linear regression model for estimating each speaker based on features derived from the mixture. The employed feature representation is a sparse, non-negative encoding of the speech mixture in terms of pre-learned speaker-dependent dictionaries. Previous work has shown that this feature representation by itself provides some degree of separation. We show that the performance is significantly proved when regression analysis is performed on the sparse, non-negative features, both compared to linear regression on spectral features and compared to separation based directly on the non-negative sparse features.

Demonstration: Here are a few audio samples demonstrating the MAP-SNMF-5 algorithm described in the paper.

MixtureSpeaker 1Speaker 2
Male-Female

Male-Male

Female-Female


Files:
 imm5272.pdf
 waspaa2007_poster.pdf
Cite:
Mikkel N. Schmidt and Rasmus K. Olsson, Linear Regression on Sparse Features for Single-Channel Speech Separation, Applications of Signal Processing to Audio and Acoustics, IEEE Workshop on (WASPAA), 2007
BibTeX:
@article{schmidt07waspaa,
   title = "Linear Regression on Sparse Features for Single-Channel Speech Separation",
   author = "Mikkel N. Schmidt and Rasmus K. Olsson",
   booktitle = "Applications of Signal Processing to Audio and Acoustics, IEEE Workshop on (WASPAA)",
   year = "2007"
}
 
 
Mikkel N. Schmidt | Technical University of Denmark | Email: mns(a)imm.dtu.dk