Separation of Audio-Visual Speech Sources: A New Approach Exploiting the Audio-Visual Coherence of Speech Stimuli

Sodoyer, David; Schwartz, Jean-Luc; Girin, Laurent; Klinkisch, Jacob; Jutten, Christian

doi:10.1155/S1110865702207015

Research Article
Published: 28 November 2002

Separation of Audio-Visual Speech Sources: A New Approach Exploiting the Audio-Visual Coherence of Speech Stimuli

David Sodoyer¹,
Jean-Luc Schwartz¹,
Laurent Girin¹,
Jacob Klinkisch¹ &
…
Christian Jutten¹

EURASIP Journal on Advances in Signal Processing volume 2002, Article number: 382823 (2002) Cite this article

1433 Accesses
26 Citations
Metrics details

Abstract

We present a new approach to the source separation problem in the case of multiple speech signals. The method is based on the use of automatic lipreading, the objective is to extract an acoustic speech signal from other acoustic signals by exploiting its coherence with the speaker′s lip movements. We consider the case of an additive stationary mixture of decorrelated sources, with no further assumptions on independence or non-Gaussian character. Firstly, we present a theoretical framework showing that it is indeed possible to separate a source when some of its spectral characteristics are provided to the system. Then we address the case of audio-visual sources. We show how, if a statistical model of the joint probability of visual and spectral audio input is learnt to quantify the audio-visual coherence, separation can be achieved by maximizing this probability. Finally, we present a number of separation results on a corpus of vowel-plosive-vowel sequences uttered by a single speaker, embedded in a mixture of other voices. We show that separation can be quite good for mixtures of 2, 3, and 5 sources. These results, while very preliminary, are encouraging, and are discussed in respect to their potential complementarity with traditional pure audio separation or enhancement techniques.

Author information

Authors and Affiliations

Institut de la Communication Parlée, Institut National Polytechnique de Grenoble, Université Stendhal, CNRS UMR 5009, ICP, INPG, 46 avenue Félix Viallet, Grenoble Cedex 1, 38031, France
David Sodoyer, Jean-Luc Schwartz, Laurent Girin, Jacob Klinkisch & Christian Jutten

Authors

David Sodoyer
View author publications
You can also search for this author in PubMed Google Scholar
Jean-Luc Schwartz
View author publications
You can also search for this author in PubMed Google Scholar
Laurent Girin
View author publications
You can also search for this author in PubMed Google Scholar
Jacob Klinkisch
View author publications
You can also search for this author in PubMed Google Scholar
Christian Jutten
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to David Sodoyer.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sodoyer, D., Schwartz, JL., Girin, L. et al. Separation of Audio-Visual Speech Sources: A New Approach Exploiting the Audio-Visual Coherence of Speech Stimuli. EURASIP J. Adv. Signal Process. 2002, 382823 (2002). https://doi.org/10.1155/S1110865702207015

Download citation

Received: 19 October 2001
Revised: 07 May 2002
Published: 28 November 2002
DOI: https://doi.org/10.1155/S1110865702207015

Separation of Audio-Visual Speech Sources: A New Approach Exploiting the Audio-Visual Coherence of Speech Stimuli

Abstract

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords