- Research Article
- Open access
- Published:
A Review of Signal Subspace Speech Enhancement and Its Application to Noise Robust Speech Recognition
EURASIP Journal on Advances in Signal Processing volume 2007, Article number: 045821 (2006)
Abstract
The objective of this paper is threefold: (1) to provide an extensive review of signal subspace speech enhancement, (2) to derive an upper bound for the performance of these techniques, and (3) to present a comprehensive study of the potential of subspace filtering to increase the robustness of automatic speech recognisers against stationary additive noise distortions. Subspace filtering methods are based on the orthogonal decomposition of the noisy speech observation space into a signal subspace and a noise subspace. This decomposition is possible under the assumption of a low-rank model for speech, and on the availability of an estimate of the noise correlation matrix. We present an extensive overview of the available estimators, and derive a theoretical estimator to experimentally assess an upper bound to the performance that can be achieved by any subspace-based method. Automatic speech recognition experiments with noisy data demonstrate that subspace-based speech enhancement can significantly increase the robustness of these systems in additive coloured noise environments. Optimal performance is obtained only if no explicit rank reduction of the noisy Hankel matrix is performed. Although this strategy might increase the level of the residual noise, it reduces the risk of removing essential signal information for the recogniser's back end. Finally, it is also shown that subspace filtering compares favourably to the well-known spectral subtraction technique.
References
Tufts DW, Kumaresan R, Kirsteins I: Data adaptive signal estimation by singular value decomposition of a data matrix. Proceedings of the IEEE 1982,70(6):684–685.
Cadzow JA: Signal enhancement—a composite property mapping algorithm. IEEE Transactions on Acoustics, Speech, and Signal Processing 1988,36(1):49–62. 10.1109/29.1488
Dendrinos M, Bakamidis S, Carayannis G: Speech enhancement from noise: a regenerative approach. Speech Communication 1991,10(1):45–57. 10.1016/0167-6393(91)90027-Q
De Moor B: The singular value decomposition and long and short spaces of noisy matrices. IEEE Transactions on Signal Processing 1993,41(9):2826–2838. 10.1109/78.236505
Van Huffel S: Enhanced resolution based on minimum variance estimation and exponential data modeling. Signal Processing 1993,33(3):333–355. 10.1016/0165-1684(93)90130-3
Ephraim Y, Van Trees HL: A signal subspace approach for speech enhancement. IEEE Transactions on Speech and Audio Processing 1995,3(4):251–266. 10.1109/89.397090
Hu Y, Loizou P: Perceptual weighting motivated subspace based speech enhancement approach. Proceedings of International Conference on Spoken Language Processing (ICSLP '02), September 2002, Denver, Colo, USA 1797–1800.
Jabloun F, Champagne B: Incorporating the human hearing properties in the signal subspace approach for speech enhancement. IEEE Transactions on Speech and Audio Processing 2003,11(6):700–708. 10.1109/TSA.2003.818031
Hu Y, Loizou PC: A perceptually motivated approach for speech enhancement. IEEE Transactions on Speech and Audio Processing 2003,11(5):457–465. 10.1109/TSA.2003.815936
Jensen SH, Hansen PC, Hansen SD, Sørensen JA: Reduction of broad-band noise in speech by truncated QSVD. IEEE Transactions on Speech and Audio Processing 1995,3(6):439–448. 10.1109/89.482211
Rezayee A, Gazor S: An adaptive KLT approach for speech enhancement. IEEE Transactions on Speech and Audio Processing 2001,9(2):87–95. 10.1109/89.902276
Lev-Ari H, Ephraim Y: Extension of the signal subspace speech enhancement approach to colored noise. IEEE Signal Processing Letters 2003,10(4):104–106. 10.1109/LSP.2003.808544
Hansen PSK, Hansen PC, Hansen SD, Sørensen JA: Experimental comparison of signal subspace based noise reduction methods. Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '99), March 1999, Phoenix, Ariz, USA 1: 101–104.
Huang J, Zhao Y: Energy-constrained signal subspace method for speech enhancement and recognition. IEEE Signal Processing Letters 1997,4(10):283–285. 10.1109/97.633769
Hermus K, Verhelst W, Wambacq P: Optimized subspace weighting for robust speech recognition in additive noise environments. Proceedings of 6th International Conference on Spoken Language Processing (ICSLP '00), October 2000, Beijing, China 3: 542–545.
Hermus K, Wambacq P: Assessment of signal subspace based speech enhancement for noise robust speech recognition. Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '04), May 2004, Montreal, Quebec, Canada 1: 945–948.
Dologlou I, Carayannis G: Physical interpretation of signal reconstruction from reduced rank matrices. IEEE Transactions on Signal Processing 1991,39(7):1681–1682. 10.1109/78.134407
Hansen PC, Jensen SH: FIR filter representations of reduced-rank noise reduction. IEEE Transactions on Signal Processing 1998,46(6):1737–1741. 10.1109/78.678511
Ephraim Y, Van Trees HL: A signal subspace approach for speech enhancement. Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '93), April 1993, Minneapolis, Minn, USA 2: 355–358.
Hermus K: Signal subspace decompositions for perceptual speech and audio processing, Ph.D. dissertation.
Doclo S, Moonen M: GSVD-based optimal filtering for single and multimicrophone speech enhancement. IEEE Transactions on Signal Processing 2002,50(9):2230–2244. 10.1109/TSP.2002.801937
Soon IY, Koh SN, Yeo CK: Noisy speech enhancement using discrete cosine transform. Speech Communication 1998,24(3):249–257. 10.1016/S0167-6393(98)00019-3
Rissanen J: Modeling by shortest data description. Automatica 1978,14(5):465–471. 10.1016/0005-1098(78)90005-5
Bakamidis S, Dendrinos M, Carayannis G: SVD analysis by synthesis of harmonic signals. IEEE Transactions on Signal Processing 1991,39(2):472–477. 10.1109/78.80831
Martin R: Noise power spectral density estimation based on optimal smoothing and minimum statistics. IEEE Transactions on Speech and Audio Processing 2001,9(5):504–512. 10.1109/89.928915
Cohen I: Noise spectrum estimation in adverse environments: improved minima controlled recursive averaging. IEEE Transactions on Speech and Audio Processing 2003,11(5):466–475. 10.1109/TSA.2003.811544
Rangachari S, Loizou PC, Hu Y: A noise estimation algorithm with rapid adaptation for highly non-stationary environments. Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '04), May 2004, Montreal, Quebec, Canada 1: 305–308.
Golub G, Van Loan C (Eds): Matrix Computations. Johns Hopkins University Press, Baltimore, Md, USA; 1983.
Hansen PC, Jensen SH: Prewhitening for rank-deficient noise in subspace methods for noise reduction. IEEE Transactions on Signal Processing 2005,53(10):3718–3726.
Mittal U, Phamdo N: Signal/noise KLT based approach for enhancing speech degraded by colored noise. IEEE Transactions on Speech and Audio Processing 2000,8(2):159–167. 10.1109/89.824700
Hu Y, Loizou PC: A subspace approach for enhancing speech corrupted by colored noise. Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '02), May 2002, Orlando, Fla, USA 1: 573–576.
Hu Y, Loizou PC: A generalized subspace approach for enhancing speech corrupted by colored noise. IEEE Transactions on Speech and Audio Processing 2003,11(4):334–341. 10.1109/TSA.2003.814458
Kang GS, Fransen LJ: Quality improvement of LPC-processed noisy speech by using spectral subtraction. IEEE Transactions on Acoustics, Speech, and Signal Processing 1989,37(6):939–942. 10.1109/ASSP.1989.28065
Linguistic Data Consortium (LDC) https://doi.org/www.ldc.upenn.edu
Hirsch H-G, Pearce D: The AURORA experimental framework for the performance evaluation of speech recognition systems under noisy conditions. Proceedings of International Speech Communication Association (ISCA) Workshop: Authomatic Speech Recognition: Challanges for the New Millenium (ASR '00), September 2000, Paris, France 181–188.
Demuynck K: Extracting, modelling and combining information in speech recognition, Ph.D. dissertation.
Duchateau J, Demuynck K, Van Compernolle D: Fast and accurate acoustic modelling with semi-continuous HMMs. Speech Communication 1998,24(1):5–17. 10.1016/S0167-6393(98)00002-8
Gong Y: Speech recognition in noisy environments: a survey. Speech Communication 1995,16(3):261–291. 10.1016/0167-6393(94)00059-J
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License ( https://creativecommons.org/licenses/by/2.0 ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
About this article
Cite this article
Hermus, K., Wambacq, P. & Van hamme, H. A Review of Signal Subspace Speech Enhancement and Its Application to Noise Robust Speech Recognition. EURASIP J. Adv. Signal Process. 2007, 045821 (2006). https://doi.org/10.1155/2007/45821
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1155/2007/45821