Open Access

A Tutorial on Text-Independent Speaker Verification

  • Frédéric Bimbot1Email author,
  • Jean-François Bonastre2,
  • Corinne Fredouille2,
  • Guillaume Gravier1,
  • Ivan Magrin-Chagnolleau3,
  • Sylvain Meignier2,
  • Teva Merlin2,
  • Javier Ortega-García4,
  • Dijana Petrovska-Delacrétaz5 and
  • Douglas A. Reynolds6
EURASIP Journal on Advances in Signal Processing20042004:101962

DOI: 10.1155/S1110865704310024

Received: 2 December 2002

Published: 21 April 2004

Abstract

This paper presents an overview of a state-of-the-art text-independent speaker verification system. First, an introduction proposes a modular scheme of the training and test phases of a speaker verification system. Then, the most commonly speech parameterization used in speaker verification, namely, cepstral analysis, is detailed. Gaussian mixture modeling, which is the speaker modeling technique used in most systems, is then explained. A few speaker modeling alternatives, namely, neural networks and support vector machines, are mentioned. Normalization of scores is then explained, as this is a very important step to deal with real-world data. The evaluation of a speaker verification system is then detailed, and the detection error trade-off (DET) curve is explained. Several extensions of speaker verification are then enumerated, including speaker tracking and segmentation by speakers. Then, some applications of speaker verification are proposed, including on-site applications, remote applications, applications relative to structuring audio information, and games. Issues concerning the forensic area are then recalled, as we believe it is very important to inform people about the actual performance and limitations of speaker verification systems. This paper concludes by giving a few research trends in speaker verification for the next couple of years.

Keywords and phrases

speaker verification text-independent cepstral analysis Gaussian mixture modeling

Authors’ Affiliations

(1)
IRISA, INRIA & CNRS
(2)
LIA, University of Avignon
(3)
Laboratoire Dynamique du Langage, CNRS
(4)
ATVS, Universidad Politécnica de Madrid
(5)
DIVA Laboratory, Informatics Department, Fribourg University
(6)
Lincoln Laboratory, Massachusetts Institute of Technology

Copyright

© Bimbot et al. 2004

This article is published under license to BioMed Central Ltd. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Advertisement