Open Access

A Review of Signal Subspace Speech Enhancement and Its Application to Noise Robust Speech Recognition

EURASIP Journal on Advances in Signal Processing20062007:045821

DOI: 10.1155/2007/45821

Received: 24 October 2005

Accepted: 30 April 2006

Published: 13 September 2006

Abstract

The objective of this paper is threefold: (1) to provide an extensive review of signal subspace speech enhancement, (2) to derive an upper bound for the performance of these techniques, and (3) to present a comprehensive study of the potential of subspace filtering to increase the robustness of automatic speech recognisers against stationary additive noise distortions. Subspace filtering methods are based on the orthogonal decomposition of the noisy speech observation space into a signal subspace and a noise subspace. This decomposition is possible under the assumption of a low-rank model for speech, and on the availability of an estimate of the noise correlation matrix. We present an extensive overview of the available estimators, and derive a theoretical estimator to experimentally assess an upper bound to the performance that can be achieved by any subspace-based method. Automatic speech recognition https://static-content.springer.com/image/art%3A10.1155%2F2007%2F45821/MediaObjects/13634_2005_Article_1980_IEq1_HTML.gif experiments with noisy data demonstrate that subspace-based speech enhancement can significantly increase the robustness of these systems in additive coloured noise environments. Optimal performance is obtained only if no explicit rank reduction of the noisy Hankel matrix is performed. Although this strategy might increase the level of the residual noise, it reduces the risk of removing essential signal information for the recogniser's back end. Finally, it is also shown that subspace filtering compares favourably to the well-known spectral subtraction technique.

[1234567891011121314151617181920212223242526272829303132333435363738]

Authors’ Affiliations

(1)
Department of Electrical Engineering - ESAT, Katholieke Universiteit Leuven

References

  1. Tufts DW, Kumaresan R, Kirsteins I: Data adaptive signal estimation by singular value decomposition of a data matrix. Proceedings of the IEEE 1982,70(6):684-685.View ArticleGoogle Scholar
  2. Cadzow JA: Signal enhancement—a composite property mapping algorithm. IEEE Transactions on Acoustics, Speech, and Signal Processing 1988,36(1):49-62. 10.1109/29.1488MathSciNetView ArticleMATHGoogle Scholar
  3. Dendrinos M, Bakamidis S, Carayannis G: Speech enhancement from noise: a regenerative approach. Speech Communication 1991,10(1):45-57. 10.1016/0167-6393(91)90027-QView ArticleGoogle Scholar
  4. De Moor B: The singular value decomposition and long and short spaces of noisy matrices. IEEE Transactions on Signal Processing 1993,41(9):2826-2838. 10.1109/78.236505View ArticleMATHGoogle Scholar
  5. Van Huffel S: Enhanced resolution based on minimum variance estimation and exponential data modeling. Signal Processing 1993,33(3):333-355. 10.1016/0165-1684(93)90130-3MathSciNetView ArticleGoogle Scholar
  6. Ephraim Y, Van Trees HL: A signal subspace approach for speech enhancement. IEEE Transactions on Speech and Audio Processing 1995,3(4):251-266. 10.1109/89.397090View ArticleGoogle Scholar
  7. Hu Y, Loizou P: Perceptual weighting motivated subspace based speech enhancement approach. Proceedings of International Conference on Spoken Language Processing (ICSLP '02), September 2002, Denver, Colo, USA 1797-1800.Google Scholar
  8. Jabloun F, Champagne B: Incorporating the human hearing properties in the signal subspace approach for speech enhancement. IEEE Transactions on Speech and Audio Processing 2003,11(6):700-708. 10.1109/TSA.2003.818031View ArticleGoogle Scholar
  9. Hu Y, Loizou PC: A perceptually motivated approach for speech enhancement. IEEE Transactions on Speech and Audio Processing 2003,11(5):457-465. 10.1109/TSA.2003.815936View ArticleGoogle Scholar
  10. Jensen SH, Hansen PC, Hansen SD, Sørensen JA: Reduction of broad-band noise in speech by truncated QSVD. IEEE Transactions on Speech and Audio Processing 1995,3(6):439-448. 10.1109/89.482211View ArticleMATHGoogle Scholar
  11. Rezayee A, Gazor S: An adaptive KLT approach for speech enhancement. IEEE Transactions on Speech and Audio Processing 2001,9(2):87-95. 10.1109/89.902276View ArticleGoogle Scholar
  12. Lev-Ari H, Ephraim Y: Extension of the signal subspace speech enhancement approach to colored noise. IEEE Signal Processing Letters 2003,10(4):104-106. 10.1109/LSP.2003.808544View ArticleGoogle Scholar
  13. Hansen PSK, Hansen PC, Hansen SD, Sørensen JA: Experimental comparison of signal subspace based noise reduction methods. Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '99), March 1999, Phoenix, Ariz, USA 1: 101-104.View ArticleGoogle Scholar
  14. Huang J, Zhao Y: Energy-constrained signal subspace method for speech enhancement and recognition. IEEE Signal Processing Letters 1997,4(10):283-285. 10.1109/97.633769View ArticleGoogle Scholar
  15. Hermus K, Verhelst W, Wambacq P: Optimized subspace weighting for robust speech recognition in additive noise environments. Proceedings of 6th International Conference on Spoken Language Processing (ICSLP '00), October 2000, Beijing, China 3: 542-545.Google Scholar
  16. Hermus K, Wambacq P: Assessment of signal subspace based speech enhancement for noise robust speech recognition. Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '04), May 2004, Montreal, Quebec, Canada 1: 945-948.Google Scholar
  17. Dologlou I, Carayannis G: Physical interpretation of signal reconstruction from reduced rank matrices. IEEE Transactions on Signal Processing 1991,39(7):1681-1682. 10.1109/78.134407View ArticleGoogle Scholar
  18. Hansen PC, Jensen SH: FIR filter representations of reduced-rank noise reduction. IEEE Transactions on Signal Processing 1998,46(6):1737-1741. 10.1109/78.678511View ArticleGoogle Scholar
  19. Ephraim Y, Van Trees HL: A signal subspace approach for speech enhancement. Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '93), April 1993, Minneapolis, Minn, USA 2: 355-358.Google Scholar
  20. Hermus K: Signal subspace decompositions for perceptual speech and audio processing, Ph.D. dissertation.
  21. Doclo S, Moonen M: GSVD-based optimal filtering for single and multimicrophone speech enhancement. IEEE Transactions on Signal Processing 2002,50(9):2230-2244. 10.1109/TSP.2002.801937View ArticleGoogle Scholar
  22. Soon IY, Koh SN, Yeo CK: Noisy speech enhancement using discrete cosine transform. Speech Communication 1998,24(3):249-257. 10.1016/S0167-6393(98)00019-3View ArticleGoogle Scholar
  23. Rissanen J: Modeling by shortest data description. Automatica 1978,14(5):465-471. 10.1016/0005-1098(78)90005-5View ArticleMATHGoogle Scholar
  24. Bakamidis S, Dendrinos M, Carayannis G: SVD analysis by synthesis of harmonic signals. IEEE Transactions on Signal Processing 1991,39(2):472-477. 10.1109/78.80831View ArticleGoogle Scholar
  25. Martin R: Noise power spectral density estimation based on optimal smoothing and minimum statistics. IEEE Transactions on Speech and Audio Processing 2001,9(5):504-512. 10.1109/89.928915View ArticleGoogle Scholar
  26. Cohen I: Noise spectrum estimation in adverse environments: improved minima controlled recursive averaging. IEEE Transactions on Speech and Audio Processing 2003,11(5):466-475. 10.1109/TSA.2003.811544View ArticleGoogle Scholar
  27. Rangachari S, Loizou PC, Hu Y: A noise estimation algorithm with rapid adaptation for highly non-stationary environments. Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '04), May 2004, Montreal, Quebec, Canada 1: 305-308.Google Scholar
  28. Golub G, Van Loan C (Eds): Matrix Computations. Johns Hopkins University Press, Baltimore, Md, USA; 1983.MATHGoogle Scholar
  29. Hansen PC, Jensen SH: Prewhitening for rank-deficient noise in subspace methods for noise reduction. IEEE Transactions on Signal Processing 2005,53(10):3718-3726.MathSciNetView ArticleGoogle Scholar
  30. Mittal U, Phamdo N: Signal/noise KLT based approach for enhancing speech degraded by colored noise. IEEE Transactions on Speech and Audio Processing 2000,8(2):159-167. 10.1109/89.824700View ArticleGoogle Scholar
  31. Hu Y, Loizou PC: A subspace approach for enhancing speech corrupted by colored noise. Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '02), May 2002, Orlando, Fla, USA 1: 573-576.Google Scholar
  32. Hu Y, Loizou PC: A generalized subspace approach for enhancing speech corrupted by colored noise. IEEE Transactions on Speech and Audio Processing 2003,11(4):334-341. 10.1109/TSA.2003.814458View ArticleGoogle Scholar
  33. Kang GS, Fransen LJ: Quality improvement of LPC-processed noisy speech by using spectral subtraction. IEEE Transactions on Acoustics, Speech, and Signal Processing 1989,37(6):939-942. 10.1109/ASSP.1989.28065View ArticleGoogle Scholar
  34. Linguistic Data Consortium (LDC) http://www.ldc.upenn.edu
  35. Hirsch H-G, Pearce D: The AURORA experimental framework for the performance evaluation of speech recognition systems under noisy conditions. Proceedings of International Speech Communication Association (ISCA) Workshop: Authomatic Speech Recognition: Challanges for the New Millenium (ASR '00), September 2000, Paris, France 181-188.Google Scholar
  36. Demuynck K: Extracting, modelling and combining information in speech recognition, Ph.D. dissertation.
  37. Duchateau J, Demuynck K, Van Compernolle D: Fast and accurate acoustic modelling with semi-continuous HMMs. Speech Communication 1998,24(1):5-17. 10.1016/S0167-6393(98)00002-8View ArticleGoogle Scholar
  38. Gong Y: Speech recognition in noisy environments: a survey. Speech Communication 1995,16(3):261-291. 10.1016/0167-6393(94)00059-JView ArticleGoogle Scholar

Copyright

© Kris Hermus et al. 2007

This article is published under license to BioMed Central Ltd. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.