Sector-Based Detection for Hands-Free Speech Enhancement in Cars

Lathoud, Guillaume; Bourgeois, Julien; Freudenberger, Jürgen

doi:10.1155/ASP/2006/20683

Research Article
Open access
Published: 01 December 2006

Sector-Based Detection for Hands-Free Speech Enhancement in Cars

Guillaume Lathoud^1,2,
Julien Bourgeois³ &
Jürgen Freudenberger³

EURASIP Journal on Advances in Signal Processing volume 2006, Article number: 020683 (2006) Cite this article

1244 Accesses
7 Citations
Metrics details

Abstract

Adaptation control of beamforming interference cancellation techniques is investigated for in-car speech acquisition. Two efficient adaptation control methods are proposed that avoid target cancellation. The "implicit" method varies the step-size continuously, based on the filtered output signal. The "explicit" method decides in a binary manner whether to adapt or not, based on a novel estimate of target and interference energies. It estimates the average delay-sum power within a volume of space, for the same cost as the classical delay-sum. Experiments on real in-car data validate both methods, including a case with km/h background road noise.

References

Shriberg E, Stolcke A, Baron D: Can prosody aid the automatic processing of multi-party meetings? Evidence from predicting punctuation, disfluencies, and overlapping speech. Proceedings of ISCA Tutorial and Research Workshop on Prosody in Speech Recognition and Understanding, October 2001, Red Bank, NJ, USA 139–146.
Google Scholar
Affes S, Grenier Y: Test of adaptive beamformers for speech acquisition in cars. Proceedings of 5th International Conference on Signal Processing Applications and Technology (ICSPAT '94), October 1994, Dallas, Tex, USA 1: 154–159.
Google Scholar
Van Veen BD, Buckley KM: Beamforming: a versatile approach to spatial filtering. IEEE ASSP Magazine 1988, 5(2):4–24.
Article Google Scholar
Van Compernolle D: Switching adaptive filters for enhancing noisy and reverberant speech from microphone array recordings. Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '90), April 1990, Albuquerque, NM, USA 2: 833–836.
Google Scholar
Affes S, Grenier Y: A signal subspace tracking algorithm for microphone array processing of speech. IEEE Transactions Speech Audio Processing 1997, 5(5):425–437. 10.1109/89.622565
Article Google Scholar
Hoshuyama O, Sugiyama A: A robust adaptive beamformer for microphone arrays with a blocking matrix using constrained adaptive filters. Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '96), May 1996, Atlanta, Ga, USA 2: 925–828.
Google Scholar
Buck M, Haulick T: Robust adaptive beamformers for automotive applications. Proceedings of DAGA, March 2004, Strasbourg, France
Google Scholar
Hoshuyama O, Begasse B, Sugiyama A, Hirano A: A real time robust adaptive microphone array controlled by an SNR estimate. Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '98), May 1998, Seattle, Wash, USA 6: 3605–3608.
Google Scholar
Herbordt W, Trini T, Kellermann W: Robust spatial estimation of the signal-to-interference ratio for non-stationary mixtures. Proceedings of International Workshop on Acoustic Echo and Noise Control (IWAENC '03), September 2003, Kyoto, Japan 247–250.
Google Scholar
Hoshuyama O, Sugiyama A, Hirano A: A robust adaptive beamformer for microphone arrays with a blocking matrix using constrained adaptive filters. IEEE Transactions Signal Processing 1999, 47(10):2677–2684. 10.1109/78.790650
Article Google Scholar
Gannot S, Burshtein D, Weinstein E: Signal enhancement using beamforming and nonstationarity with applications to speech. IEEE Transactions Signal Processing 2001, 49(8):1614–1626. 10.1109/78.934132
Article Google Scholar
Lathoud G, Magimai.-Doss M: A sector-based, frequency-domain approach to detection and localization of multiple speakers. Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '05), March 2005, Philadelphia, Pa, USA 3: 265–268.
Google Scholar
Ellis D, Liu J: Speaker turn segmentation based on between-channel differences. Proceedings of ICASSP-NIST Meeting Recognition Workshop, May 2004, Montreal, Quebec, Canada 112–117.
Google Scholar
Lathoud G, McCowan IA, Odobez J-M: Unsupervised location-based segmentation of multi-party speech. Proceedings of ICASSP-NIST Meeting Recognition Workshop, May 2004, Montreal, Quebec, Canada
Google Scholar
Herbordt W, Kellermann W, Nakamura S: Joint optimization of LCMV beamforming and acoustic echo cancellation. Proceedings of 12th European Signal Processing Conference (EUSIPCO '04), September 2004, Vienna, Austria 2003–2006.
Google Scholar
Lathoud G, Bourgeois J, Freudenberger J: Multichannel speech enhancement in cars: explicit vs. implicit adaptation control. Proceedings of Joint Workshop on Hands-Free Speech Communication and Microphone Arrays (HSCMA '05), March 2005, Piscataway, NJ, USA
Google Scholar
Roweis ST: Factorial models and refiltering for speech separation and denoising. Proceedings of 8th European Conference on Speech Communication and Technology (EUROSPEECH '03), September 2003, Geneva, Switzerland 1009–1012.
Google Scholar
Lathoud G, McCowan IA: A sector-based approach for localization of multiple speakers with microphone arrays. Proceedings of ISCA Tutorial and Research Workshop on Statistical and Perceptual Audio Processing (SAPA '04), October 2004, Jeju, Korea
Google Scholar
Knapp C, Carter G: The generalized correlation method for estimation of time delay. IEEE Transactions Acoustics, Speech, Signal Processing 1976, 24(4):320–327. 10.1109/TASSP.1976.1162830
Article Google Scholar
Moore BCJ: An Introduction to the Psychology of Hearing. 4th edition. Academic Press, London, UK; 1997.
Google Scholar
Lathoud G, Magimai.-Doss M, Mesot B: A spectrogram model for enhanced source localization and noise-robust ASR. Proceedings of 9th European Conference on Speech Communication and Technology (INTERSPEECH '05), September 2005, Lisbon, Portugal
Google Scholar
Widrow B, Stearns SD: Adaptive Signal Processing. Prentice-Hall, Englewood Cliffs, NJ, USA; 1985.
MATH Google Scholar
Griffiths LJ, Jim CW: An alternative approach to linearly constrained adaptive beamforming. IEEE Transactions on Antennas and Propagation 1982, 30(1):27–34. 10.1109/TAP.1982.1142739
Article Google Scholar
Hoshuyama O, Sugiyama A: A robust adaptive beamformer for microphone arrays with a blocking matrix using constrained adaptive filters. Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '96), May 1996, Atlanta, Ga, USA 2: 925–928.
Google Scholar
Mader A, Puder H, Schmidt GU: Step-size control for acoustic echo cancellation filters—an overview. Signal Processing 2000, 80(9):1697–1719. 10.1016/S0165-1684(00)00082-7
Article Google Scholar
Bourgeois J, Freudenberger J, Lathoud G: Implicit control of noise canceller for speech enhancement. Proceedings of 9th European Conference on Speech Communication and Technology (INTERSPEECH '05), September 2005, Lisbon, Portugal
Google Scholar
Dempster AP, Laird NM, Rubin DB: Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B 1977, 39(1):1–38.
MathSciNet MATH Google Scholar
Moon TK, Stirling WC: Mathematical Methods and Algorithms for Signal Processing. Prentice-Hall, Upper Saddle River, NJ, USA; 2000.
Google Scholar

Download references

Author information

Authors and Affiliations

IDIAP Research Institute, Martigny, 1920, Switzerland
Guillaume Lathoud
École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, 1015, Switzerland
Guillaume Lathoud
DaimlerChrysler Research and Technology, Ulm, 89014, Germany
Julien Bourgeois & Jürgen Freudenberger

Authors

Guillaume Lathoud
View author publications
You can also search for this author in PubMed Google Scholar
Julien Bourgeois
View author publications
You can also search for this author in PubMed Google Scholar
Jürgen Freudenberger
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Guillaume Lathoud.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License ( https://creativecommons.org/licenses/by/2.0 ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Lathoud, G., Bourgeois, J. & Freudenberger, J. Sector-Based Detection for Hands-Free Speech Enhancement in Cars. EURASIP J. Adv. Signal Process. 2006, 020683 (2006). https://doi.org/10.1155/ASP/2006/20683

Download citation

Received: 31 January 2005
Revised: 20 July 2005
Accepted: 22 August 2005
Published: 01 December 2006
DOI: https://doi.org/10.1155/ASP/2006/20683

Sector-Based Detection for Hands-Free Speech Enhancement in Cars

Abstract

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords