Skip to main content

An overview on optimized NLMS algorithms for acoustic echo cancellation

Abstract

Acoustic echo cancellation represents one of the most challenging system identification problems. The most used adaptive filter in this application is the popular normalized least mean square (NLMS) algorithm, which has to address the classical compromise between fast convergence/tracking and low misadjustment. In order to meet these conflicting requirements, the step-size of this algorithm needs to be controlled. Inspired by the pioneering work of Prof. E. Hänsler and his collaborators on this fundamental topic, we present in this paper several solutions to control the adaptation of the NLMS adaptive filter. The developed algorithms are “non-parametric” in nature, i.e., they do not require any additional features to control their behavior. Simulation results indicate the good performance of the proposed solutions and support the practical applicability of these algorithms.

1 Review

1.1 Introduction

Hands-free audio terminals are required in many popular applications, such as mobile telephony and teleconferencing systems. An important issue that has to be addressed when dealing with such devices is the acoustic coupling between the loudspeaker and the microphone. Due to this coupling, the microphone of the device captures a signal coming from its own loudspeaker, known as the acoustic echo. This phenomenon is influenced by the environment’s characteristics, and it can be very disturbing for the users. For example, in a telephone conversation, the user could hear a replica of her/his own voice. Consequently, in order to enhance the overall quality of the communication, there is a need to cancel the unwanted acoustic echo.

In this context, acoustic echo cancellation (AEC) provides one of the best solutions to the control of acoustic echoes generated by hands-free audio terminals. The basic issue in AEC is then to estimate the impulse response between the loudspeaker and the microphone of the device. The most reliable solution to this problem is the use of an adaptive filter that generates at its output a replica of the echo, which is further subtracted from the microphone signal [19]. In other words, the adaptive filter has to model an unknown system (i.e., the acoustic echo path between the loudspeaker and the microphone), like in a “system identification” problem [1012].

Despite the straightforward formulation of the problem, there are several specific features of AEC, which represent a challenge for any adaptive algorithm. First, the acoustic echo paths have excessive lengths in time (up to hundreds of milliseconds), due to the slow speed of sound in the air, together with multiple reflections caused by the environment; consequently, long length adaptive filters are required (hundreds or even thousands of coefficients), thus influencing the convergence rate of the algorithm. Also, the acoustic echo paths are time-variant systems, depending on temperature, pressure, humidity, and movement of objects or bodies; hence, good tracking capabilities are required for the echo canceller. Second, the echo signal is combined with the near-end signal; ideally, the adaptive filter should separate this mixture and provide an estimate of the echo at its output and an estimate of the near-end from the error signal (from this point of view, the adaptive filter works as in an “interference cancelling” configuration [1012]). This is not an easy task, since the near-end signal can contain both the background noise and the near-end speech; the background noise can be non-stationary and strong (it is also amplified by the microphone of the hands-free device), while the near-end speech acts like a large level disturbance. Moreover, the input of the adaptive filter (i.e., the far-end signal) is mainly speech, which is a non-stationary and highly correlated signal that can influence the overall performance of the adaptive algorithm.

In addition, the double-talk case (i.e., the talkers on both sides speak simultaneously) is perhaps the most challenging situation in AEC. The behavior of the adaptive filter can be seriously affected in this case, up to divergence. For this reason, the echo canceller is usually equipped with a double-talk detector (DTD), in order to slow down or completely halt the adaptation process during double-talk periods [6, 7]. Nevertheless, there is some inherent delay in the decision of any DTD; during this small period, a few undetected large amplitude samples can perturb the echo path estimate considerably. Consequently, it is highly desirable to improve the robustness of the adaptive algorithm in order to handle a certain amount of double-talk without diverging.

Many adaptive algorithms were proposed in the context of AEC [19, 13], but the workhorse remains the normalized least mean square (NLMS) algorithm [1012]. The main reasons behind this popularity are its moderate computational complexity, together with its good numerical stability. The performance of the NLMS algorithm is influenced by two important parameters, i.e., the normalized step-size and regularization terms [1, 8, 11]. The first one reflects a trade-off between convergence rate and misadjustment of the algorithm. The second parameter is essential in all ill-posed and ill-conditioned problems such as in adaptive filters; it depends on the signal-to-noise ratio (SNR) of the system [14]. Both these parameters can be controlled (i.e., making them time dependent) in order to address the conflicting requirement of fast convergence and low misadjustment. This was the main motivation behind the development of variable step-size (VSS) and variable regularized (VR) versions of the NLMS algorithm, e.g., [13, 1525]. Even if they focus on the optimization of different parameters, the VSS-NLMS and VR-NLMS algorithms are basically equivalent in terms of their purpose [1, 19]. In general, most of them require the tuning of some additional parameters that are difficult to control in practice. For real-world AEC applications, it is highly desirable to design “non-parametric” algorithms, which can operate without requiring additional features related to the acoustic environment (e.g., system change detector).

In this context, the contributions of Prof. E. Hänsler and his collaborators represent real milestones in the field. For example, in [1], Hänsler and Schmidt present a comprehensive and insightful review of the methods and algorithms used for acoustic echo and noise control. In their work, a special interest is given to the performance analysis of the NLMS algorithm (e.g., see Chapters 7 and 13 from [1]), in terms of developing optimal expressions for its control parameters, i.e., the normalized step-size and regularization term. In Section 1.2 of this paper, we summarize their main findings related to the control of the NLMS algorithm. Also, in Section 1.3, we present another benchmark solution, i.e., the non-parametric variable step-size NLMS (NPVSS-NLMS) algorithm [19]. Motivated and inspired by the work of Hänsler and Schmidt [1] (summarized in Section 1.2), we extend their findings in the framework of a state variable model (similar to Kalman filtering) [26]. The joint-optimized NLMS (JO-NLMS) algorithm developed in Section 1.4 brings together three main elements: a time-variant system model, an optimization criterion based on the minimization of the system misalignment, and an iterative procedure for adjusting the system model parameter. Consequently, it achieves a proper compromise between the performance criteria, i.e., fast convergence/tracking and low misadjustment, without requiring any additional features to control its behavior (like stability thresholds or system change detector). Simulations performed in Section 1.5 support the theoretical findings and indicate the good performance of the presented algorithms. Finally, Section 2 concludes this work and outlines several perspectives.

1.2 Control of the NLMS algorithm

Let us consider the framework of a system identification problem (as shown in Fig. 1), like in AEC [19]. The far-end (or loudspeaker) signal, x(n), goes through the echo path, h(n), providing the echo signal, y(n), where n is the time index. This signal is added to the near-end signal, v(n) (which can contain both the background noise and the near-end speech), resulting the microphone signal, d(n). The adaptive filter, defined by the vector \(\widehat {\mathbf {h}}(n)\), aims to produce at its output an estimate of the echo, \(\widehat {y}(n)\), while the error signal, e(n), should contain an estimate of the near-end signal.

Fig. 1
figure 1

General scheme. Acoustic echo cancellation configuration

Summarizing, the main goal of this application is to model an unknown system using an adaptive filter, both driven by the same zero-mean input signal, x(n). These two systems are assumed to be finite impulse response (FIR) filters of length L, defined by the real-valued vectors:

$$\begin{array}{*{20}l} \mathbf{h}(n) &= \left[\begin{array}{cccc} h_{0}(n) & h_{1}(n) & \cdots & h_{L-1}(n) \end{array} \right]^{T}, \\ \widehat{\mathbf{h}}(n) &= \left[ \begin{array}{cccc} \widehat{h}_{0}(n) & \widehat{h}_{1}(n) & \cdots & \widehat{h}_{L-1}(n) \end{array} \right]^{T}, \end{array} $$

where superscript T denotes transposition. The desired (or microphone) signal for the adaptive filter is

$$\begin{array}{*{20}l} d(n) &= \mathbf{x}^{T}(n) \mathbf{h}(n) + v(n) \\ &= y(n) + v(n), \end{array} $$
((1))

where

$$\begin{array}{@{}rcl@{}} \mathbf{x}(n) = \left[ \begin{array}{cccc} x(n) & x(n-1) & \cdots & x(n-L+1) \end{array} \right]^{T} \end{array} $$

is a real-valued vector containing the L most recent time samples of the input signal, x(n), and v(n) (i.e., the near-end signal) plays the role of the system noise (assumed to be quasi-stationary, zero mean, and independent of x(n)) that corrupts the output of the unknown system.

Using the previous notation, we may define the a priori and a posteriori error signals as

$$\begin{array}{*{20}l} e(n) &= d(n) - \mathbf{x}^{T}(n) \widehat{\mathbf{h}}(n-1) \\ &= \mathbf{x}^{T}(n) \left[ \mathbf{h}(n) - \widehat{\mathbf{h}}(n-1) \right] + v(n), \end{array} $$
((2))
$$\begin{array}{*{20}l} \varepsilon (n) &= d(n) - \mathbf{x}^{T}(n) \widehat{\mathbf{h}}(n) \\ &= \mathbf{x}^{T}(n) \left[ \mathbf{h}(n) - \widehat{\mathbf{h}}(n) \right] + v(n), \end{array} $$
((3))

where the vectors \(\widehat {\mathbf {h}}(n-1)\) and \(\widehat {\mathbf {h}}(n)\) contain the adaptive filter coefficients at time n−1 and n, respectively. The update equation for NLMS-type algorithms is

$$\begin{array}{@{}rcl@{}} \widehat{\mathbf{h}}(n) = \widehat{\mathbf{h}}(n-1) + \mu (n) \mathbf{x}(n) e(n), \end{array} $$
((4))

where μ(n) is a positive factor known as the step-size, which governs the stability, the convergence rate, and the misadjustment of the algorithm. A reasonable way to derive μ(n), taking into account the stability conditions, is to cancel the a posteriori error signal [27]. Replacing (4) in (3) with the requirement ε(n)=0, it results that

$$\begin{array}{@{}rcl@{}} \varepsilon (n) = e(n) \left[ 1 - \mu (n) \mathbf{x}^{T}(n) \mathbf{x}(n) \right] = 0 \end{array} $$
((5))

and assuming that e(n)≠0, we find

$$\begin{array}{@{}rcl@{}} \mu(n) = \frac{1} { \mathbf{x}^{T}(n) \mathbf{x}(n) }. \end{array} $$
((6))

We should note that the above procedure makes sense in the absence of noise [i.e., v(n)=0], where the condition ε(n)=0 implies that \(\mathbf {x}^{T}(n) \left [ \mathbf {h}(n) - \widehat {\mathbf {h}}(n) \right ]=0\). Finding the parameter μ(n) in the presence of noise will introduce noise in \(\widehat {\mathbf {h}}(n)\), since the condition ε(n)=0 leads to \(\mathbf {x}^{T}(n) \left [ \mathbf {h}(n) - \widehat {\mathbf {h}}(n) \right ]=-v (n) \neq 0\). In fact, we would like to have \(\mathbf {x}^{T}(n) \left [ \mathbf {h}(n) - \widehat {\mathbf {h}}(n) \right ]=0\), which implies that ε(n)=v(n).

In practice, a positive constant α (with 0<α<2), known as the normalized step-size, multiplies (6) to achieve a proper compromise between the convergence rate and the misadjustment [1012]; also, a positive constant δ, known as the regularization parameter, is added to the denominator of (6) in order to make the adaptive filter work well in the presence of noise. Consequently, the well-known update equation of the NLMS algorithm becomes

$$\begin{array}{@{}rcl@{}} \widehat{\mathbf{h}}(n) = \widehat{\mathbf{h}}(n-1) + \frac{\alpha\mathbf{x}(n) e(n)} { \mathbf{x}^{T}(n) \mathbf{x}(n) + \delta}. \end{array} $$
((7))

1.2.1 Performance analysis

Both the control parameters, i.e., α and δ, highly influence the overall performance of the NLMS algorithm. An insightful analysis of their influence was developed by Hänsler and Schmidt in [1]. To begin, let us define the a posteriori misalignment (also known as the system mismatch [1]) as

$$\begin{array}{@{}rcl@{}} \mathbf{m}(n) = \mathbf{h}(n) - \widehat{\mathbf{h}}(n). \end{array} $$
((8))

Assuming that the unknown system is time-invariant, i.e.,

$$\begin{array}{@{}rcl@{}} \mathbf{h}(n) = \mathbf{h}(n-1), \end{array} $$
((9))

the updated equation (7) of the NLMS algorithm can be used, together with (8), in order to derive an update in terms of the a posteriori misalignment. Consequently, it results

$$ \mathbf{m}(n) = \mathbf{m}(n-1)-\frac{\alpha\mathbf{x}(n)e(n)} {\mathbf{x}^{T}(n)\mathbf{x}(n)+\delta}. $$
((10))

Taking the 2 norm in (10), we obtain

$$\begin{array}{*{20}l} \left\| \mathbf{m}(n) \right\|^{2}_{2} &= \left\| \mathbf{m}(n-1) \right\|^{2}_{2} - 2\frac{\alpha\mathbf{m}^{T}(n-1)\mathbf{x}(n)e(n)} {\left\| \mathbf{x}(n) \right\|^{2}_{2}+\delta} \\ & \quad + \frac{\alpha^{2} \left\| \mathbf{x}(n) \right\|^{2}_{2} e^{2}(n)}{\left[ \left\| \mathbf{x}(n) \right\|^{2}_{2} +\delta \right]^{2}}. \end{array} $$
((11))

Based on (2) and (9), the a priori error signal can be expressed as

$$\begin{array}{@{}rcl@{}} e(n) = \mathbf{x}^{T}(n)\mathbf{m}(n-1) + v(n), \end{array} $$
((12))

so that, using (12) in (11), it results

$${} \begin{aligned} \left\| \mathbf{m}(n) \right\|^{2}_{2} &= \left\| \mathbf{m}(n-1) \right\|^{2}_{2} \\&\quad- 2\frac{\alpha\mathbf{m}^{T}(n-1)\mathbf{x}(n)\left[ \mathbf{x}^{T}(n)\mathbf{m}(n-1) + v(n)\right]} {\left\| \mathbf{x}(n) \right\|^{2}_{2} +\delta} \\ & \quad + \frac{\alpha^{2} \left\| \mathbf{x}(n) \right\|^{2}_{2} \left[ \mathbf{x}^{T}(n)\mathbf{m}(n-1) + v(n)\right]^{2}}{\left[ \left\| \mathbf{x}(n) \right\|^{2}_{2} +\delta \right]^{2}}. \end{aligned} $$
((13))

Next, taking mathematical expectation on both sides of (13) and removing the uncorrelated products (since x(n) and v(n) are assumed to be independent and zero mean), we obtain

$${} {\fontsize{8.8pt}{9.6pt}\selectfont{\begin{aligned} E\left[\left\| \mathbf{m}(n) \right\|^{2}_{2}\right] &= E\left[\left\| \mathbf{m}(n-1) \right\|^{2}_{2}\right] \\&\quad- 2\alpha E\left[\frac{\mathbf{m}^{T}(n-1)\mathbf{x}(n)\mathbf{x}^{T}(n)\mathbf{m}(n-1)} {\left\| \mathbf{x}(n) \right\|^{2}_{2} +\delta}\right] \\ & \quad + \alpha^{2} E\left\{\frac{\mathbf{m}^{T}(n-1)\mathbf{x}(n)\mathbf{x}^{T}(n)\mathbf{m}(n-1)\left\| \mathbf{x}(n) \right\|^{2}_{2}}{\left[ \left\| \mathbf{x}(n) \right\|^{2}_{2} +\delta \right]^{2}}\right\} \\ & \quad + \alpha^{2} E\left\{\frac{v^{2}(n)\left\| \mathbf{x}(n) \right\|^{2}_{2}}{\left[ \left\| \mathbf{x}(n) \right\|^{2}_{2} +\delta \right]^{2}}\right\}. \end{aligned}}} $$
((14))

It is clear that \(E\left [\left \| \mathbf {x}(n) \right \|^{2}_{2}\right ] = L{\sigma _{x}^{2}}\), where \({\sigma _{x}^{2}} = E \left [ x^{2}(n) \right ]\) is the variance of the input signal. For large values of L (i.e., L1), it holds that \(\left \| \mathbf {x}(n) \right \|^{2}_{2} \approx L{\sigma _{x}^{2}}\) [1, 19]. Consequently,

$$ \frac{1}{\left\| \mathbf{x}(n) \right\|^{2}_{2} + \delta} \approx \frac{1}{L{\sigma_{x}^{2}} + \delta}, $$
((15))

so that, for a large value of L and a certain stationarity degree of the input signal, we can treat this term as a deterministic quantity at this point [1]. Under these circumstances, (14) becomes

$${} {\fontsize{8.8pt}{9.6pt}\selectfont{\begin{aligned} E\left[\left\| \mathbf{m}(n) \right\|^{2}_{2}\right] &= E\left[\left\| \mathbf{m}(n-1) \right\|^{2}_{2}\right] \\ & \quad - \frac{2\alpha}{L{\sigma_{x}^{2}} + \delta} E\left[\mathbf{m}^{T}(n-1)\mathbf{x}(n)\mathbf{x}^{T}(n)\mathbf{m}(n-1)\right] \\ & \quad + \frac{\alpha^{2} L{\sigma_{x}^{2}}}{\left(L{\sigma_{x}^{2}} + \delta \right)^{2}} E\left[\mathbf{m}^{T}(n-1)\mathbf{x}(n)\mathbf{x}^{T}(n)\mathbf{m}(n-1)\right] \\ & \quad + \frac{\alpha^{2} L{\sigma_{x}^{2}}}{\left(L{\sigma_{x}^{2}} + \delta \right)^{2}}{\sigma_{v}^{2}}, \end{aligned}}} $$
((16))

where \({\sigma _{v}^{2}} = E\left [v^{2}(n)\right ]\) is the variance of the system noise. Next, it can be assumed that the input vector, x(n), and the a posteriori misalignment vector, m(n−1), are statistically independent, and x(n) is white. In this case,

$${} E\left[\mathbf{m}^{T}(n-1)\mathbf{x}(n)\mathbf{x}^{T}(n)\mathbf{m}(n-1)\right] = {\sigma_{x}^{2}} E\left[\left\| \mathbf{m}(n-1) \right\|^{2}_{2}\right]. $$
((17))

Summarizing, (16) can be rewritten as

$$ \begin{aligned} E\left[\left\| \mathbf{m}(n) \right\|^{2}_{2}\right] &= A\left(\alpha,\delta,L,{\sigma_{x}^{2}} \right) E\left[\left\| \mathbf{m}(n-1) \right\|^{2}_{2}\right] \\&\quad + B\left(\alpha,\delta,L,{\sigma_{x}^{2}}\right){\sigma_{v}^{2}}, \end{aligned} $$
((18))

where

$$\begin{array}{*{20}l} A\left(\alpha,\delta,L,{\sigma_{x}^{2}} \right) &= 1 - \frac{2{\alpha\sigma_{x}^{2}}}{L{\sigma_{x}^{2}}+\delta} + \frac{\alpha^{2} L{\sigma_{x}^{4}}}{\left(L{\sigma_{x}^{2}}+\delta\right)^{2}}, \end{array} $$
((19))
$$\begin{array}{*{20}l} B\left(\alpha,\delta,L,{\sigma_{x}^{2}}\right) &= \frac{\alpha^{2} L{\sigma_{x}^{2}}}{\left(L{\sigma_{x}^{2}}+\delta\right)^{2}}, \end{array} $$
((20))

represent the so-called contraction and expansion parameters, respectively [1].

Clearly, the contraction parameter, \(A\left (\alpha,\delta,L,{\sigma _{x}^{2}} \right)\), should be always smaller than 1, which is certainly fulfilled for 0<α<2 and δ≥0. The expansion parameter, \(B\left (\alpha,\delta,L,{\sigma _{x}^{2}}\right)\), is related to the influence of the system noise, since it multiplies \({\sigma _{v}^{2}}\). Both terms depend on the control parameters, α and δ, as well as on the filter length, L, and the input signal power, \({\sigma _{x}^{2}}\). However, a compromise should be made when setting the values of the control parameters. For example, if the influence of the system noise should be eliminated completely, i.e., \(B\left (\alpha,\delta,L,{\sigma _{x}^{2}}\right) = 0\), we should set α=0 or δ=, which, on the other hand, leads to \(A\left (\alpha,\delta,L,{\sigma _{x}^{2}} \right) = 1\), i.e., the filter will not be updated. The fastest convergence (FC) mode is achieved when \(A\left (\alpha,\delta,L,{\sigma _{x}^{2}} \right)\) reaches its minimum (e.g., for α=1 and δ=0), which, unfortunately, increases the misadjustment (in terms of the influence of the system noise). For example, taking the normalized step-size as the reference parameter and evaluating

$$ \frac{\partial A\left(\alpha,\delta,L,{\sigma_{x}^{2}} \right)}{\partial \alpha} \bigg|_{\alpha = \alpha_{\text{FC}}} = 0, $$
((21))

it results

$$\begin{array}{@{}rcl@{}} \alpha_{\text{FC}}=1+\frac{\delta}{L{\sigma_{x}^{2}}}. \end{array} $$
((22))

Neglecting the regularization constant (i.e., δ≈0), the fastest convergence mode is achieved for α≈1, which is a well-known result [1, 11, 12]. Also, the stability condition can be found by imposing \(\left |A(\alpha,\delta,L,{\sigma _{x}^{2}})\right | < 1\), which leads to

$$\begin{array}{@{}rcl@{}} 0<\alpha_{\text{stable}}<2 \left(1+\frac{\delta}{L{\sigma_{x}^{2}}} \right) = 2\alpha_{\text{FC}}. \end{array} $$
((23))

Again, taking δ=0 in (23), the standard stability condition of the NLMS algorithm results, i.e., 0<α<2. On the other hand, the lowest misadjustment (LM) is obtained when the term from (20) reaches its minimum. Also, taking the normalized step-size as the reference parameter and evaluating

$$ \frac{\partial B\left(\alpha,\delta,L,{\sigma_{x}^{2}} \right)}{\partial \alpha} \bigg|_{\alpha = \alpha_{\text{LM}}} = 0, $$
((24))

the lowest misadjustment mode requires

$$\begin{array}{@{}rcl@{}} \alpha_{\text{LM}} = 0, \end{array} $$
((25))

which is also a well-known result [1, 11, 12]; unfortunately, the filter is not updated in this case.

Summarizing, the convergence rate of the algorithm is not influenced by the level of the system noise, but the misadjustment increases when the system noise increases. More importantly, it can be noticed that the expansion term from (20) always increases when α increases; this concludes the fact that a higher value of the normalized step-size increases the misadjustment. Nevertheless, the ideal requirements of the algorithm are for both fast convergence and low misadjustment. It is clear that (22) and (25) “push” the normalized step-size in opposite directions. This aspect represents the motivation behind the VSS approaches, i.e., the normalized step-size needs to be controlled in order to meet these conflicting requirements. The regularization constant also influences the performance of the algorithm, but in a “milder” way. It can be noticed that the contraction term from (19) always decreases when the regularization constant increases, while the expansion term from (20) always increases when the regularization constant decreases.

1.2.2 Optimal choice of the control parameters

Motivated by these findings, Hänsler and Schmidt proposed in [1] (Chapter 13) an optimal choice for the control parameters of the NLMS algorithm. First, the non-regularized version of the NLMS algorithm is considered (also imposing that the normalized step-size is time dependent), with the update

$$\begin{array}{@{}rcl@{}} \widehat{\mathbf{h}}(n) = \widehat{\mathbf{h}}(n-1) + \frac{\alpha(n)\mathbf{x}(n) e(n)} { \left\| \mathbf{x}(n) \right\|^{2}_{2} }. \end{array} $$
((26))

Next, developing in (12), it results

$$\begin{array}{@{}rcl@{}} \mathbf{x}^{T}(n)\mathbf{m}(n-1) = e(n) - v(n) \triangleq e_{\mathrm{u}}(n), \end{array} $$
((27))

where e u(n) denotes the so-called undistorted error signal [1], i.e., the part of the error that is not affected by the system noise. Using this notation, (11) can be rewritten as

$${} \left\| \mathbf{m}(n) \right\|^{2}_{2} = \left\| \mathbf{m}(n-1) \right\|^{2}_{2} - 2\frac{\alpha(n) e_{\mathrm{u}}(n)e(n)} {\left\| \mathbf{x}(n) \right\|^{2}_{2}} + \frac{\alpha^{2}(n) e^{2}(n)}{ \left\| \mathbf{x}(n) \right\|^{2}_{2} }, $$
((28))

which implies that

$${} \begin{aligned} E\left[ \left\| \mathbf{m}(n) \right\|^{2}_{2} \right] &= E\left[ \left\| \mathbf{m}(n-1) \right\|^{2}_{2} \right] - 2\alpha(n) E\left[ \frac{e_{\mathrm{u}}(n)e(n)} {\left\| \mathbf{x}(n) \right\|^{2}_{2}} \right] \\ & \quad + \alpha^{2}(n) E\left[ \frac{e^{2}(n)}{ \left\| \mathbf{x}(n) \right\|^{2}_{2}} \right]. \end{aligned} $$
((29))

A natural optimization criterion to follow in any system identification problem is the minimization of system misalignment. Consequently, imposing the condition:

$$ \left. \frac{\partial E\left[ \left\| \mathbf{m}(n) \right\|^{2}_{2} \right]}{\partial \alpha(n)} \right|_{\alpha(n) = \alpha_{\text{opt}}(n)} = 0 $$
((30))

and assuming that the normalized step-sizes at different time instants are uncorrelated, the optimal normalized step-size results as

$$ \alpha_{\text{opt}}(n) = \frac{E\left[ \frac{e_{\mathrm{u}}(n)e(n)} {\left\| \mathbf{x}(n) \right\|^{2}_{2}} \right]}{E\left[ \frac{e^{2}(n)}{ \left\| \mathbf{x}(n) \right\|^{2}_{2}} \right]}. $$
((31))

For large values of L (i.e., L1), the assumption \(\left \| \mathbf {x}(n) \right \|^{2}_{2} \approx L{\sigma _{x}^{2}}\) is valid [1, 19]. Also, since the input signal, x(n), and the system noise, v(n), are uncorrelated, the undistorted error signal, e u(n), is also uncorrelated with the system noise. Therefore, (31) simplifies to

$$ \alpha_{\text{opt}}(n) = \frac{E\left[ e_{\mathrm{u}}^{2}(n)\right]}{E\left[ e^{2}(n) \right]}. $$
((32))

In the absence of the system noise [i.e., v(n)=0], the a priori error signal, e(n), equals the undistorted error signal, e u(n), so that and the optimal normalized step-size is equal to 1, which justifies the discussion related to (6). In the presence of the system noise, when the adaptive filter starts to converge, the power of the undistorted error signal, e u(n), decreases and, consequently, the normalized step-size decreases, thus leading to low misadjustment.

Unfortunately, the undistorted error signal, e u(n), is not available in practice. In order to overcome this issue, several solutions were proposed in [1]. For example, assuming that the excitation, x(n), is white and considering that the input vector, x(n), and the a posteriori misalignment vector, m(n−1), are statistically independent, (32) can be developed based on (17) as

$$\begin{array}{*{20}l} \alpha_{\text{opt}}(n) &= \frac{E\left[ e_{\mathrm{u}}^{2}(n)\right]}{E\left[ e^{2}(n) \right]} \\ &= \frac{E\left[ \mathbf{m}^{T}(n-1)\mathbf{x}(n)\mathbf{x}^{T}(n)\mathbf{m}(n-1)\right]}{E\left[ e^{2}(n) \right]} \\ &= \frac{{\sigma_{x}^{2}} E\left[\left\| \mathbf{m}(n-1) \right\|^{2}_{2}\right]}{E\left[ e^{2}(n) \right]}. \end{array} $$
((33))

Now the problem is reduced to the estimation of \(E\left [ \left \| \mathbf {m}(n-1) \right \|^{2}_{2} \right ]\). A solution to estimate this term is based on the “delay and extrapolation” approach [1]. In other words, if an additional artificial delay is introduced into the unknown system, this delay is also modeled by the adaptive filter. Thus, utilizing the property of adaptive algorithms to spread the filter misalignment evenly over all coefficients, the known part of the (true) system misalignment vector can be extrapolated, thus resulting

$$\begin{array}{@{}rcl@{}} \left\| \mathbf{m}(n-1) \right\|^{2}_{2} \approx \frac{L}{L_{\mathrm{D}}}\sum_{l=0}^{L_{\mathrm{D}}-1}\widehat{h}_{l}^{2}(n), \end{array} $$
((34))

where L D denotes the number of coefficients corresponding to the artificial delay. However, this method may freeze the adaptation when the unknown system changes, which would require an additional system change detector [1].

The second control parameter of the NLMS algorithm is the regularization term, δ. Using a similar approach as before, the only-regularized version of the NLMS algorithm is considered (also imposing that the regularization parameter is time dependent), with the update

$$\begin{array}{@{}rcl@{}} \widehat{\mathbf{h}}(n) = \widehat{\mathbf{h}}(n-1) + \frac{\mathbf{x}(n) e(n)} { \left\| \mathbf{x}(n) \right\|^{2}_{2} + \delta(n) }. \end{array} $$
((35))

In this case, (11) can be rewritten as

$${} \begin{aligned} \left\| \mathbf{m}(n) \right\|^{2}_{2} = \left\| \mathbf{m}(n-1) \right\|^{2}_{2} &- 2\frac{e_{\mathrm{u}}(n)e(n)} {\left\| \mathbf{x}(n) \right\|^{2}_{2} + \delta(n)} \\&+ \frac{e^{2}(n)\left\| \mathbf{x}(n) \right\|^{2}_{2}}{ \left[\left\| \mathbf{x}(n) \right\|^{2}_{2} + \delta(n) \right]^{2} }, \end{aligned} $$
((36))

so that,

$${} \begin{aligned} E\left[ \left\| \mathbf{m}(n) \right\|^{2}_{2} \right] &= E\left[ \left\| \mathbf{m}(n-1) \right\|^{2}_{2} \right] - 2E\left[ \frac{e_{\mathrm{u}}(n)e(n)} {\left\| \mathbf{x}(n) \right\|^{2}_{2} + \delta(n)} \right] \\ & \quad + E\left\{ \frac{e^{2}(n)\left\| \mathbf{x}(n) \right\|^{2}_{2}}{ \left[\left\| \mathbf{x}(n) \right\|^{2}_{2} + \delta(n) \right]^{2}} \right\}. \end{aligned} $$
((37))

Following the same criterion, i.e., minimization of system misalignment, it can be imposed that the condition

$$ \left. \frac{\partial E\left[ \left\| \mathbf{m}(n) \right\|^{2}_{2} \right]}{\partial \delta(n)} \right|_{\delta(n) = \delta_{\text{opt}}(n)} = 0. $$
((38))

Under similar considerations and assumptions as in the case of α opt(n), (38) leads to the optimal regularization parameter, which can be further developed as

$$ \begin{aligned} \delta_{\text{opt}}(n) &= \frac{\left\{E\left[ e^{2}(n) \right] - E\left[ e_{\mathrm{u}}^{2}(n) \right]\right\} L{\sigma_{x}^{2}}} {E\left[ e_{\mathrm{u}}^{2}(n) \right]} \\ &= \frac{E\left[ v^{2}(n) \right]{L\sigma_{x}^{2}}} {E\left[ e_{\mathrm{u}}^{2}(n) \right]} \\ &= \frac{L{\sigma_{v}^{2}}}{E\left[ \left\| \mathbf{m}(n-1) \right\|^{2}_{2} \right]}. \end{aligned} $$
((39))

The denominator of (39) can be evaluated based on (34). Also, another important parameter to be found is the noise power, \({\sigma _{v}^{2}}\). There are different methods for estimating this parameter; for example, in echo cancellation, it can be estimated during silences of the near-end talker [19]. Also, other practical methods to estimate \({\sigma _{v}^{2}}\) in AEC can be found in [28, 29] (which are briefly detailed in the end of Section 1.3). However, we should note that different other estimators can be used for the noise power; the analysis of their influence on the algorithms’ performance is beyond the scope of this paper.

Concluding, both control methods proposed by Hänsler and Schmidt in [1] [i.e., α opt(n) and δ opt(n)] are theoretically equivalent and represent valuable benchmarks in the field of VSS/VR-NLMS algorithms. However, in practice, their implementations are usually different. In most cases, the control of the normalized step-size is preferred, mainly due to the limited dynamic range of its values; on the other hand, the regularization control usually requires an upper bound (to avoid overflow in case of very large values).

1.3 NPVSS-NLMS algorithm

In the previous section, the optimization criterion used for adjusting the control parameters was the minimization of the system misalignment. However, in a system identification setup like AEC (as shown in Fig. 1), this is equivalent to recover the system noise from the error of the adaptive filter [1].

Consequently, getting back to the discussion related to (2)–(6), the step-size parameter, μ(n) (which is deterministic in nature), can be found by imposing the condition [19]

$$\begin{array}{@{}rcl@{}} E \left[ \varepsilon^{2}(n) \right] = {\sigma_{v}^{2}}. \end{array} $$
((40))

Following this requirement, we rewrite (5) as

$$\begin{array}{@{}rcl@{}} \varepsilon (n) = e(n) \left[ 1 - \mu (n) \mathbf{x}^{T}(n) \mathbf{x}(n) \right] = v (n). \end{array} $$
((41))

Squaring the previous equation, then taking mathematical expectation on both sides, and using the approximation \(\mathbf {x}^{T}(n) \mathbf {x}(n) \approx LE \left [ x^{2}(n) \right ] = L {\sigma _{x}^{2}}\) (which is valid for L1), it results

$$\begin{array}{@{}rcl@{}} \left[ 1 - \mu (n) L {\sigma_{x}^{2}} \right]^{2} {\sigma_{e}^{2}}(n) = {\sigma_{v}^{2}}, \end{array} $$
((42))

where \({\sigma _{e}^{2}}(n) = E \left [ e^{2}(n) \right ]\) is the power of the error signal. Thus, developing (42), we obtain the quadratic equation

$$\begin{array}{@{}rcl@{}} \mu^{2}(n) - \frac{2}{L {\sigma_{x}^{2}}} \mu (n) + \frac{1} {\left(L {\sigma_{x}^{2}} \right)^{2}} \left[1 - \frac{ {\sigma_{v}^{2}}} { {\sigma_{e}^{2}}(n)} \right] = 0, \end{array} $$
((43))

for which the obvious solution is (also using \(L {\sigma _{x}^{2}} \approx \mathbf {x}^{T}(n) \mathbf {x}(n)\))

$$\begin{array}{*{20}l} \mu_{\textrm{NPVSS}}(n) &= \frac{1} { \mathbf{x}^{T}(n) \mathbf{x}(n)} \left[ 1 - \frac{\sigma_{v} }{ \sigma_{e}(n)} \right] \\ &= \frac{\alpha_{\textrm{NPVSS}}(n)}{\mathbf{x}^{T}(n) \mathbf{x}(n)}, \end{array} $$
((44))

where

$$\begin{array}{@{}rcl@{}} \alpha_{\textrm{NPVSS}}(n) = 1 - \frac{\sigma_{v} }{ \sigma_{e}(n) } \end{array} $$
((45))

is the variable normalized step-size. Therefore, the update of the non-parametric variable step-size NLMS (NPVSS-NLMS) algorithm [19] is

$$\begin{array}{@{}rcl@{}} \widehat{\mathbf{h}}(n) = \widehat{\mathbf{h}}(n-1) + \mu_{\textrm{NPVSS}}(n) \mathbf{x}(n) e(n). \end{array} $$
((46))

Let us examine the behavior of the algorithm in terms of its normalized step-size. Looking at (44), it is obvious that before the algorithm converges, σ e (n) is large compared to σ v and, consequently, α NPVSS(n)≈1. When the algorithm has converged to the true solution, σ e (n)≈σ v and α NPVSS(n)≈0. This is the desired behavior for the adaptive algorithm, leading to both fast convergence and low misadjustment.

We can compare (45) to the optimal step-size parameter from (32), which results in

$$\begin{array}{@{}rcl@{}} \alpha_{\text{opt}}(n) = \alpha_{\textrm{NPVSS}}(n) \left[1 + \frac{\sigma_{v} }{ \sigma_{e}(n) }\right]. \end{array} $$
((47))

It is clear that α opt(n) is larger than α NPVSS(n) by a factor between 1 and 2, but the two variable step-sizes have the same effect for good convergence and low misadjustment.

In order to analyze the convergence of the misalignment for the NPVSS-NLMS algorithm, we suppose that the system is stationary (as in (9)). Using the a posteriori misalignment vector defined in (8), the update equation of the algorithm (46) can be rewritten in terms of the misalignment as

$$\begin{array}{@{}rcl@{}} \mathbf{m}(n) = \mathbf{m}(n-1) - \mu_{\textrm{NPVSS}}(n) \mathbf{x}(n) e(n). \end{array} $$
((48))

Taking the l 2 norm in (48), then mathematical expectation on both sides, and assuming that

$$\begin{array}{@{}rcl@{}} E \left[ v(n) \mathbf{x}^{T}(n) \mathbf{m}(n-1) \right] =0, \end{array} $$
((49))

which is true if v(n) is white, we obtain

$$ \begin{aligned}{} E\! \left[\! \| \mathbf{m}(n) \|_{2}^{2} \right] - E\! \left[ \| \mathbf{m}(n-1) \|_{2}^{2} \right] &= - \mu_{\textrm{NPVSS}}(n) \left[ \sigma_{e}(n)\right.\\&\left.\quad - \sigma_{v} \right] \left[ \sigma_{e}(n) + 2 \sigma_{v} \right]\! \leq 0. \end{aligned} $$
((50))

The previous expression proves that the length of the misalignment vector for the NPVSS-NLMS algorithm is non-increasing, which implies that

$$\begin{array}{@{}rcl@{}} {\lim}_{n \rightarrow \infty} {\sigma_{e}^{2}}(n) = {\sigma_{v}^{2}}. \end{array} $$
((51))

It should be noticed that the previous relation does not imply that \(E \left [ \| \mathbf {m}(\infty) \|_{2}^{2} \right ]=0\). However, under the independence assumption, we can show the equivalence. Indeed, from (12), it can be shown that

$$\begin{array}{@{}rcl@{}} E\left[ e^{2}(n) \right] = {\sigma_{v}^{2}} + \text{tr} \left[ \mathbf{R} \mathbf{K}(n-1) \right] \end{array} $$
((52))

if x(n) are independent (i.e., the white input assumption), where tr(·) is the trace of a matrix, R=E[x(n)x T(n)], and K(n−1)=E[m(n−1)m T(n−1)]. Taking (51) into account, (52) becomes

$$\begin{array}{@{}rcl@{}} \text{tr} \left[ \mathbf{R} \mathbf{K}(\infty) \right] =0. \end{array} $$
((53))

Assuming that R>0 (i.e., R is a positive definite matrix), it results that K()=0 and, consequently,

$$\begin{array}{@{}rcl@{}} E \left[ \| \mathbf{m}(\infty) \|_{2}^{2} \right]=0. \end{array} $$
((54))

Finally, some practical considerations have to be stated. First, in order that the algorithm behaves properly, a regularization constant, δ, should be added to the denominator of μ NPVSS(n). A second consideration is related to the estimation of the parameter σ e (n). In practice, the power of the error signal is estimated as follows:

$$\begin{array}{@{}rcl@{}} \widehat{\sigma}_{e}^{2}(n) = \lambda \widehat{\sigma}_{e}^{2}(n-1) + (1-\lambda) e^{2}(n), \end{array} $$
((55))

where λ is a weighting factor. Its value is chosen as λ=1−1/(K L), where K>1. The initial value for (55) is \(\widehat {\sigma }_{e}^{2}(0)=0\). Theoretically, it is clear that \({\sigma _{e}^{2}}(n) \geq {\sigma _{v}^{2}}\), which implies that μ NPVSS(n)≥0. Nevertheless, the estimation from (55) could result in a lower magnitude than the noise power estimate, which would make μ NPVSS(n) negative. In this situation, the problem is solved by setting μ NPVSS(n)=0.

The NPVSS-NLMS algorithm is summarized in Table 1. The only parameter that is needed in the step-size formula of the NPVSS-NLMS algorithm is the power estimate of the system noise. In the case of AEC, this system noise is represented by the near-end signal. Nevertheless, the estimation of the near-end signal power is not always straightforward in real-world AEC applications. Some practical solutions to this problem can be found in [28, 29].

Table 1 NPVSS-NLMS algorithm

For example, it was demonstrated in [28] that the power estimate of the near-end signal can be evaluated as

$$\begin{array}{@{}rcl@{}} \widehat{\sigma}_{v}^{2}(n) = \widehat{\sigma}_{e}^{2}(n) - \frac{1} { \widehat{\sigma}_{x}^{2}(n)} \widehat{\mathbf{r}}_{e\mathbf{x}}^{T}(n) \widehat{\mathbf{r}}_{e\mathbf{x}}(n), \end{array} $$
((56))

where the variance of e(n) is estimated based on (55) and the other terms are evaluated in a similar manner, i.e.,

$$\begin{array}{*{20}l} \widehat{\sigma}_{x}^{2}(n) &= \lambda \widehat{\sigma}_{x}^{2}(n-1) + (1-\lambda) x^{2}(n), \end{array} $$
((57))
$$\begin{array}{*{20}l} \widehat{\mathbf{r}}_{e\mathbf{x}}(n) &= \lambda \widehat{\mathbf{r}}_{e\mathbf{x}}(n-1) + (1-\lambda) \mathbf{x}(n) e(n). \end{array} $$
((58))

A more practical solution was proposed in [29]. It is known that the desired signal of the adaptive filter is expressed as d(n)=y(n)+v(n). Since the echo signal and the near-end signal can be considered uncorrelated, the previous relation can be rewritten in terms of variances as

$$\begin{array}{@{}rcl@{}} E \left[ d^{2}(n) \right] = E \left[ y^{2}(n) \right] + E \left[ v^{2}(n) \right]. \end{array} $$
((59))

Assuming that the adaptive filter has converged to a certain degree, we can use the approximation

$$\begin{array}{@{}rcl@{}} E \left[ y^{2}(n) \right] \approx E \left[ \widehat{y}^{2}(n) \right]. \end{array} $$
((60))

Consequently, using power estimates, we may compute

$$\begin{array}{@{}rcl@{}} \widehat{\sigma}_{v}^{2}(n) = \left| \widehat{\sigma}_{d}^{2}(n) - \widehat{\sigma}_{\widehat{y}}^{2}(n) \right|, \end{array} $$
((61))

where \(\widehat {\sigma }_{d}^{2}(n)\) and \(\widehat {\sigma }_{\widehat {y}}^{2}(n)\) are the power estimates of d(n) and \(\widehat {y}(n)\), respectively. These parameters can be recursively evaluated similar to (55), i.e.,

$$\begin{array}{*{20}l} \widehat{\sigma}_{d}^{2}(n) &= \lambda \widehat{\sigma}_{d}^{2}(n-1) + (1-\lambda)d^{2}(n), \end{array} $$
((62))
$$\begin{array}{*{20}l} \widehat{\sigma}_{\widehat{y}}^{2}(n) &= \lambda \widehat{\sigma}_{\widehat{y}}^{2}(n-1) + (1-\lambda)\widehat{y}^{2}(n). \end{array} $$
((63))

The absolute values in (61) prevent any minor deviations (due to the use of power estimates) from the true values, which can make the normalized step-size negative or complex.

When only the background noise is present, an estimate of its power is obtained using the right-hand term in (61). This expression holds even if the level of the background noise changes, so that there is no need for the estimation of this parameter during silences of the near-end talker. In case of double-talk, when the near-end speech is also present (assuming that it is uncorrelated with the background noise), the right-hand term in (61) still provides a power estimate of the near-end signal. Most importantly, this term depends only on the signals that are available within the AEC application, i.e., the microphone signal, d(n), and the output of the adaptive filter, \(\widehat {y}(n)\). Moreover, as it was demonstrated in [29], the estimation from (61) is also suitable for the under-modeling case, i.e., when the length of \(\widehat {\mathbf {h}}(n)\) is smaller than the length of h(n), so that an under-modeling noise appears (i.e., the residual echo caused by the part of the echo path that is not modeled by the adaptive filter; it can be interpreted as an additional noise that corrupts the near-end signal).

The main drawback of (61) is due to the approximation in (60). This assumption will be biased in the initial convergence phase or when there is a change of the echo path. Concerning the first problem, we can use a regular NLMS algorithm in the first steps (e.g., in the first L iterations).

1.4 JO-NLMS algorithm

In both previous sections, the assumption from (9) was used when evaluating the a posteriori misalignment, i.e., the unknown system is time-invariant. However, in AEC and also in many other system identification problems, this assumption is quite strong. In practice, the system to be identified could be variable in time. For example, in AEC, it can be assumed that the impulse response of the echo path is modeled by a time-varying system following a first-order Markov model [9]. Therefore, a more reliable approach could be based on the Kalman filter, since the state variable model fits better in this context [26, 30, 31].

Motivated by the work of Hänsler and Schmidt [1] (summarized in Section 1.2), we extend here their analysis by assuming that h(n) is a zero-mean random vector, which follows a simplified first-order Markov model, i.e.,

$$\begin{array}{@{}rcl@{}} \mathbf{h}(n) = \mathbf{h}(n-1) + \mathbf{w}(n), \end{array} $$
((64))

where w(n) is a zero-mean white Gaussian noise signal vector, which is uncorrelated with h(n−1). The correlation matrix of w(n) is assumed to be \(\mathbf {R}_{\mathbf {w}} = {\sigma _{w}^{2}} \mathbf {I}_{L}\), where I L is the L×L identity matrix. The variance, \({\sigma _{w}^{2}}\), captures the uncertainties in h(n). Equations (1) and (64) define now a state variable model, similar to Kalman filtering setup.

1.4.1 Convergence analysis

In the context of the previously defined model, let us consider the update of the NLMS algorithm from (7). Next, developing (7) in terms of the a posteriori misalignment from (8), also taking (64) into account, we obtain

$$ \mathbf{m}(n) = \mathbf{m}(n-1)+\mathbf{w}(n)-\frac{\alpha\mathbf{x}(n)e(n)} {\mathbf{x}^{T}(n)\mathbf{x}(n)+\delta}. $$
((65))

For large values of L (i.e., L1), it holds that \(\mathbf {x}^{T}(n)\mathbf {x}(n) \approx L{\sigma _{x}^{2}}\) [1, 19]. Consequently,

$$ \frac{\alpha}{\mathbf{x}^{T}(n)\mathbf{x}(n)+\delta} \approx \frac{\alpha}{L{\sigma_{x}^{2}} + \delta}. $$
((66))

This term contains both the control parameters, i.e., α and δ, and also the statistical information on the input signal. However, for a large value of L and a certain stationarity degree of the input signal, we can treat this term as a deterministic quantity [1].

Under these circumstances, taking the 2 norm in (65), then mathematical expectation on both sides (also using (66)), and removing the uncorrelated products, we obtain

$${} \begin{aligned} E\left[\left\| \mathbf{m}(n) \right\|^{2}_{2}\right] &= E\left[\left\| \mathbf{m}(n-1) \right\|^{2}_{2}\right] \\& \quad + L{\sigma_{w}^{2}} - \frac{2\alpha}{L{\sigma_{x}^{2}}+\delta} E\left[ \mathbf{x}^{T}(n)\mathbf{m}(n-1)e(n)\right] \\ & \quad - \frac{2\alpha}{L{\sigma_{x}^{2}}+\delta} E\left[ \mathbf{x}^{T}(n)\mathbf{w}(n)e(n)\right] \\ & \quad + \frac{\alpha^{2}}{\left(L{\sigma_{x}^{2}}+\delta\right)^{2}} E\left[e^{2}(n)\mathbf{x}^{T}(n)\mathbf{x}(n)\right]. \end{aligned} $$
((67))

In order to further process (67), let us focus on its last three cross-correlation terms. Based on (1), (8), and (64), the a priori error signal from (2) can be rewritten as

$$ e(n) = \mathbf{x}^{T}(n)\mathbf{m}(n-1)+\mathbf{x}^{T}(n)\mathbf{w}(n)+v(n). $$
((68))

Therefore, taking (68) into account within the first cross-correlation term from (67) (also removing the uncorrelated products), it results in

$${} {\fontsize{8.4pt}{9.6pt}\selectfont{\begin{aligned} E\left[\mathbf{m}^{T}(n-1)\mathbf{x}(n)e(n)\right] &\approx E\left[\mathbf{m}^{T}(n-1)\mathbf{x}(n)\mathbf{x}^{T}(n)\mathbf{m}(n-1)\right] \\ &= E\left\{\text{tr}\left[\mathbf{m}(n-1)\mathbf{m}^{T}(n-1)\mathbf{x}(n)\mathbf{x}^{T}(n)\right]\right\}. \end{aligned}}} $$
((69))

Next, the following assumptions can be considered: (i) the a posteriori misalignment at time index n−1 is uncorrelated with the input vector at time index n and (ii) the correlation matrix of the input is close to a diagonal one, i.e., \(E\left [\mathbf {x}(n)\mathbf {x}^{T}(n)\right ] \approx {\sigma _{x}^{2}}\mathbf {I}_{L}\) (this is a fairly restrictive assumption, however, it has been widely used to simplify the analysis [16]). Consequently, (69) becomes

$${} \begin{aligned} E\left[\mathbf{m}^{T}(n-1)\mathbf{x}(n)e(n)\right] &\approx \text{tr}\left\{E\left[\mathbf{m}(n-1)\mathbf{m}^{T}(n-1)\right]\right. \\&\quad\; \left. E\left[ \mathbf{x}(n)\mathbf{x}^{T}(n)\right] \right\} \\ &= {\sigma_{x}^{2}} E\left[\left\| \mathbf{m}(n-1) \right\|^{2}_{2} \right]. \end{aligned} $$
((70))

The second cross-correlation term from (67) can be also developed based on (68). Taking into account that the correlation matrix of w(n) is assumed to be diagonal and removing the uncorrelated products, it results in

$${} \begin{aligned} E\left[\mathbf{w}^{T}(n)\mathbf{x}(n)e(n)\right] &= E\left[\mathbf{w}^{T}(n)\mathbf{x}(n)\mathbf{x}^{T}(n)\mathbf{w}(n)\right] \\ &= E\left\{\text{tr}\left[\mathbf{w}(n)\mathbf{w}^{T}(n)\mathbf{x}(n)\mathbf{x}^{T}(n)\right]\right\} \\ &\approx \text{tr}\left\{E\left[\mathbf{w}(n)\mathbf{w}^{T}(n)\right] E\left[ \mathbf{x}(n)\mathbf{x}^{T}(n)\right] \right\} \\&= L {\sigma_{x}^{2}} {\sigma_{w}^{2}}. \end{aligned} $$
((71))

The last expectation term from (67) can be also expressed taking (68) into account. Using a similar approach, it results in

$${} {\fontsize{9.4pt}{9.6pt}\selectfont{\begin{aligned} E\left[e^{2}(n)\mathbf{x}^{T}(n)\mathbf{x}(n)\right] &= \text{tr}\left\{ E\left[e^{2}(n)\mathbf{x}(n)\mathbf{x}^{T}(n)\right] \right\} \\ &= \text{tr}\left\{E\left[v^{2}(n)\mathbf{x}(n)\mathbf{x}^{T}(n)\right] \right\} \\&\quad + \text{tr}\left\{ E\left\{\left[\mathbf{r}^{T}(n)\mathbf{x}(n)\right]^{2}\mathbf{x}(n)\mathbf{x}^{T}(n)\right\}\right\}, \end{aligned}}} $$
((72))

with the notation

$$\begin{array}{*{20}l} \mathbf{r}(n) &= \mathbf{m}(n-1) + \mathbf{w}(n) \\ &= \left[ \begin{array}{cccc} r_{0}(n) & r_{1}(n) & \cdots & r_{L-1}(n) \end{array} \right]^{T}. \end{array} $$

Next, let us focus on the last expectation term in (72); since the correlation matrix of the input was assumed to be diagonal, this term can be developed based on the Gaussian moment factoring theorem [32] (also known as the Isserlis’ theorem) and results in

$$ \begin{aligned}{} &E\left[\mathbf{r}^{T}(n)\mathbf{x}(n)\mathbf{r}^{T}(n)\mathbf{x}(n)\mathbf{x}(n)\mathbf{x}^{T}(n)\right]\\ &\quad= {\sigma_{x}^{4}} \mathbf{I}_{L} \sum\limits_{i=0}^{L-1}E\left[{r_{i}^{2}}(n) \right]+ 2{\sigma_{x}^{4}} E\left[\mathbf{r}(n)\mathbf{r}^{T}(n) \right]\\ &\quad= {\sigma_{x}^{4}} \mathbf{I}_{L} \left\{L{\sigma_{w}^{2}} + E\left[\left\| \mathbf{m}(n-1) \right\|^{2}_{2}\right]\right\} + 2{\sigma_{x}^{4}}\left\{{\vphantom{\left[\mathbf{m}(n-1)\mathbf{m}^{T}(n-1)\right]}}{\sigma_{w}^{2}}\mathbf{I}_{L} \right.\\ &\quad\quad\left.+\, E\left[\mathbf{m}(n-1)\mathbf{m}^{T}(n-1)\right]\right\}. \end{aligned} $$
((73))

Therefore, using the result from (73), (72) becomes

$$ {\small{\begin{aligned}{} &E\left[e^{2}(n)\mathbf{x}^{T}(n)\mathbf{x}(n)\right]\\ &\quad= L{\sigma_{x}^{2}} {\sigma_{v}^{2}} + \text{tr}\left\{{\sigma_{x}^{4}}\!\left\{L{\sigma_{w}^{2}}+ E\left[\left\| \mathbf{m}(n-1) \right\|^{2}_{2}\right]\right\}\mathbf{I}_{L}\right\}\\ &\qquad+\text{tr}\left\{2{\sigma_{x}^{4}}\left\{{\vphantom{\left[\mathbf{m}(n-1)\mathbf{m}^{T}(n-1)\right]}}{\sigma_{w}^{2}}\mathbf{I}_{L}+ E\left[\mathbf{m}(n-1)\mathbf{m}^{T}(n-1)\right]\right\}\right\} \\ &\quad= L{\sigma_{x}^{2}} {\sigma_{v}^{2}} + (L+2){\sigma_{x}^{4}} \left\{{\vphantom{\left.\left.(n-1) \right\|^{2}_{2}\right] + L{\sigma_{w}^{2}}}} E\left[{\vphantom{\left.(n-1) \right\|^{2}_{2}}}\left\| \mathbf{m}(n-1) \right\|^{2}_{2}\right] + L{\sigma_{w}^{2}} \right\}. \end{aligned}}} $$
((74))

Having all these terms, we can introduce (70), (71), and (74) in (67), also denoting \(m(n) = E\left [\left \| \mathbf {m}(n) \right \|^{2}_{2} \right ]\), to obtain

$$ m(n) = \widetilde{A}\left(\alpha,\delta,L,{\sigma_{x}^{2}} \right)m(n-1) + \widetilde{B}\left(\alpha,\delta,L,{\sigma_{x}^{2}},{\sigma_{v}^{2}},{\sigma_{w}^{2}} \right), $$
((75))

where

$$ \begin{aligned} \widetilde{A}\left(\alpha,\delta,L,{\sigma_{x}^{2}} \right) &= 1 - \frac{2{\sigma_{x}^{2}}}{L{\sigma_{x}^{2}}+\delta}\alpha + \frac{(L+2){\sigma_{x}^{4}}}{\left(L{\sigma_{x}^{2}}+\delta\right)^{2}}\alpha^{2}, \end{aligned} $$
((76))
$$ \begin{aligned} \widetilde{B}\left(\alpha,\delta,L,{\sigma_{x}^{2}},{\sigma_{v}^{2}},{\sigma_{w}^{2}} \right) &= \frac{\alpha^{2} L{\sigma_{x}^{2}} \left[{\sigma_{v}^{2}} + (L+2){\sigma_{x}^{2}}{\sigma_{w}^{2}} \right]}{\left(L{\sigma_{x}^{2}}+\delta\right)^{2}}\\ &\quad- \frac{2\alpha L{\sigma_{x}^{2}}{\sigma_{w}^{2}}}{L{\sigma_{x}^{2}}+\delta} + L{\sigma_{w}^{2}}. \end{aligned} $$
((77))

The result from (75) illustrates a “separation” between the convergence and misadjustment components, similar to (18)–(20) from Section 1.2. Therefore, the term \(\widetilde {A}\left (\alpha,\delta,L,{\sigma _{x}^{2}}\right)\) influences the convergence rate of the algorithm. As expected, it depends on the normalized step-size value, the regularization constant, the filter length, and the input signal power. It is interesting to notice that it does not depend on the system noise power, \({\sigma _{v}^{2}}\), or the uncertainties, \({\sigma _{w}^{2}}\); in other words, the convergence rate should not be influenced by these two terms. In fact, the term from (76) is very similar to the contraction parameter, \(A\left (\alpha,\delta,L,{\sigma _{x}^{2}} \right)\), from (19). Similarly, it can be noticed that the fastest convergence mode is obtained when the function from (76) reaches its minimum. Taking the normalized step-size as the reference parameter, we obtain

$$\begin{array}{@{}rcl@{}} \widetilde{\alpha}_{\text{FC}}=\frac{\delta + L{\sigma_{x}^{2}}}{(L+2){\sigma_{x}^{2}}}, \end{array} $$
((78))

which is similar to the result obtained in (22). For example, neglecting the regularization constant (i.e., δ≈0) and assuming that L2, the fastest convergence mode is achieved for α≈1, which is the same conclusion related to (22). Also, similar to (23), the stability condition can be found by imposing \(\left |\widetilde {A}(\alpha,\delta,L,{\sigma _{x}^{2}})\right | < 1\), which leads to

$$\begin{array}{@{}rcl@{}} 0<\widetilde{\alpha}_{\text{stable}}<2\frac{\delta + L{\sigma_{x}^{2}}}{(L+2){\sigma_{x}^{2}}}=2\widetilde{\alpha}_{\text{FC}}. \end{array} $$
((79))

For example, taking δ=0 and L2 in (79), the standard stability condition of the NLMS algorithm results, i.e., 0<α<2.

The term \(\widetilde {B}(\alpha,\delta,L,{\sigma _{x}^{2}},{\sigma _{v}^{2}},{\sigma _{w}^{2}})\) influences the misadjustment of the algorithm and it depends on both \({\sigma _{v}^{2}}\) and \({\sigma _{w}^{2}}\) (clearly, the misadjustment increases when these two factors increase). As we can notice, it is very similar to the expansion parameter, \(B\left (\alpha,\delta,L,{\sigma _{x}^{2}} \right)\), from (20), except the fact that the term from (77) includes now the contributions of \({\sigma _{v}^{2}}\) and \({\sigma _{w}^{2}}\). However, the lowest misadjustment is obtained in a similar way, i.e., when the function from (77) reaches its minimum. Thus, taking the normalized step-size as the reference parameter, the lowest misadjustment is achieved for

$$\begin{array}{@{}rcl@{}} \widetilde{\alpha}_{\text{LM}}= \frac{{\sigma_{w}^{2}}\left(L{\sigma_{x}^{2}}+\delta\right)}{{\sigma_{v}^{2}}+(L+2){\sigma_{x}^{2}}{\sigma_{w}^{2}}}. \end{array} $$
((80))

In order to compare this result with (25), let us assume that the system is time-invariant, i.e., \({\sigma _{w}^{2}} \approx 0\). Consequently, (80) leads to α≈0 (i.e., the lowest misadjustment is obtained for a normalized step-size close to zero), which is the same result obtained in (25).

1.4.2 Derivation of the algorithm

It is known that the ideal requirements of any adaptive algorithm are for both fast convergence and low misadjustment. In our framework, there are two important issues to be considered: (1) we have two main parameters to control, α and δ, which influence the overall performance of the NLMS algorithm and (2) in the context of system identification, it is reasonable to follow a minimization problem in terms of the system misalignment, as outlined by Hänsler and Schmidt in [1].

Thus, following (75) and considering that the two important parameters are time dependent, it can be imposed

$$\begin{array}{*{20}l} \frac{\partial m(n)}{\partial \alpha(n)} &= 0, \end{array} $$
((81))
$$\begin{array}{*{20}l} \frac{\partial m(n)}{\partial \delta(n)} &= 0. \end{array} $$
((82))

After straightforward computations, both equations lead to the same result, i.e.,

$$\begin{array}{@{}rcl@{}} \frac{\alpha(n)}{L{\sigma_{x}^{2}} + \delta(n)} = \frac{m(n-1) + L{\sigma_{w}^{2}}}{L{\sigma_{v}^{2}} + (L+2){\sigma_{x}^{2}}\left[ m(n-1) + L{\sigma_{w}^{2}} \right]}, \end{array} $$
((83))

which suggests a joint optimization process. With a proper estimation of its parameters (as will be discussed in the end of this section), the term from the right-hand side of (83) acts like a variable step-size. At this point, we can introduce (83) in (7), thus obtaining

$${} \widehat{\mathbf{h}}(n) = \widehat{\mathbf{h}}(n-1) + \frac{\left[m(n-1) + L{\sigma_{w}^{2}}\right]\mathbf{x}(n)e(n)}{L{\sigma_{v}^{2}} + (L+2){\sigma_{x}^{2}}\left[ m(n-1) + L{\sigma_{w}^{2}} \right]}. $$
((84))

Next, there is a need to update the parameter m(n) in (84). Using (83) in (75), followed by several straightforward computations, it results in

$$ \begin{aligned} m(n) &= \left\{1 - \frac{{\sigma_{x}^{2}}\left[m(n-1) + L{\sigma_{w}^{2}}\right]}{L{\sigma_{v}^{2}} + (L+2){\sigma_{x}^{2}}\left[ m(n-1) + L{\sigma_{w}^{2}} \right]} \right\}\\ &\quad\times\left[m(n-1) + L{\sigma_{w}^{2}}\right]. \end{aligned} $$
((85))

Consequently, the resulting JO-NLMS algorithm is defined by the relations (2), (84), and (85).

Finally, there are some practical considerations that should be outlined. The JO-NLMS algorithm requires the estimation of three main parameters, i.e., \({\sigma _{x}^{2}}\), \({\sigma _{v}^{2}}\), and \({\sigma _{w}^{2}}\). The first one can be easily evaluated as in the NLMS algorithm, i.e., \(\widehat {\sigma }_{x}^{2}(n) = \frac {1}{L}\mathbf {x}^{T}(n)\mathbf {x}(n)\). The second parameter (i.e., \({\sigma _{v}^{2}}\)) appears in many VSS and VR versions of the NLMS algorithm. Different methods can be used to estimate it, e.g., [19, 28, 29], as mentioned at the end of Sections 1.2 and 1.3. Maybe the most important parameter to be found is \({\sigma _{w}^{2}}\). Small values of \({\sigma _{w}^{2}}\) (i.e., the system is assumed to be time-invariant) imply a good misadjustment but a poor tracking; on the other hand, large values of \({\sigma _{w}^{2}}\) (i.e., assuming that the uncertainties in the system are high) imply a good tracking but a high misadjustment. In practice, we propose to estimate \({\sigma _{w}^{2}}\) by taking the 2 norm on both sides of (64) and replacing h(n) by its estimate \(\widehat {\mathbf {h}}(n)\), thus resulting in

$$ \widehat{\sigma}_{w}^{2}(n)=\frac{1}{L}\left\| \widehat{\mathbf{h}}(n) - \widehat{\mathbf{h}}(n-1) \right\|_{2}^{2}. $$
((86))

According to (86), the parameter \(\widehat {\sigma }_{w}^{2}(n)\) takes large values in the beginning of adaptation (or when there is an abrupt change of the system), thus providing fast convergence and tracking. On the other hand, when the algorithm starts to converge, the value of \(\widehat {\sigma }_{w}^{2}(n)\) reduces, which leads to low misadjustment. In this way, the algorithm achieves a proper compromise between the performance criteria. In finite precision implementations, in order to avoid any risk of freezing in (86), it is recommended to set a lower bound for \(\widehat {\sigma }_{w}^{2}(n)\) (e.g., the smallest positive number available).

The JO-NLMS algorithm is summarized in Table 2, in such a way that its implementation is facilitated. This algorithm is similar to the simplified Kalman filter presented in [31]. However, contrary to this one (which was obtained as an approximation of the general Kalman filter), the JO-NLMS algorithm was derived in a different manner following a specific optimization criterion. In fact, this is an alternative way to obtain the same results as with the Kalman filter.

Table 2 JO-NLMS algorithm

1.5 Simulation results

Simulations were performed in an AEC configuration, as shown in Fig. 1. The measured acoustic impulse response was truncated to 512 coefficients (Fig. 2), and the same length was used for the adaptive filter, i.e., L=512; the sampling rate is 8 kHz. We should note that in many real-world AEC scenarios, the adaptive filter works most likely in an under-modeling situation, i.e., its length is smaller than the length of the acoustic impulse response. Hence, the residual echo caused by the part of the system that cannot be modeled acts like an additional noise (that corrupts the near-end signal) and disturbs the overall performance. However, for experimental purposes, we set the same length for both the unknown system (i.e., the acoustic echo path) and the adaptive filter.

Fig. 2
figure 2

Echo path. Acoustic impulse response used in simulations

The input signal, x(n), is either a white Gaussian noise, an AR(1) process generated by filtering a white Gaussian noise through a first-order system 1/(1−0.8z −1) or a speech sequence. An independent white Gaussian noise v(n) is added to the echo signal y(n), with SNR = 20 dB (except in the last experiment where the SNR is variable and the near-end speech is also present). In most of the experiments (except in the last one), we assume that \({\sigma _{v}^{2}}\) is known; in practice, this variance can be estimated like in [19, 28, 29] (as presented in the end of Section 1.3). The tracking capability of the algorithm is an important issue in AEC, where the acoustic impulse response may rapidly change at any time during the connection. Consequently, an echo path change scenario is simulated in most of the experiments, by shifting the impulse response to the right by 12 samples, in the middle of the experiment. The measure of performance is the normalized misalignment (in dB), defined as

$$ \overline{m}(n) = 20\log_{10} \left[ \left\| \mathbf{h}(n) - \widehat{\mathbf{h}}(n) \right\|_{2}/ \left\| \mathbf{h}(n) \right\|_{2} \right]. $$
((87))

In the first set of experiments, we evaluate the performance of the optimal control parameters proposed by Hänsler and Schmidt in [1] (also summarized in Section 1.2), in order to set the benchmark for further comparisons. In this context, we consider the ideal estimation of these parameters [i.e., α opt(n) and δ opt(n) from Section 1.2], assuming that the undistorted error signal e u(n) from (27) is available and its power, \(E\left [ e_{\mathrm {u}}^{2}(n) \right ] = \sigma _{e_{\mathrm {u}}}^{2}(n)\), can be evaluated similar to (55), i.e.,

$$\begin{array}{*{20}l} \widehat{\sigma}_{e_{\mathrm{u}}}^{2}(n) &= \lambda \widehat{\sigma}_{e_{\mathrm{u}}}^{2}(n-1) + (1-\lambda)e_{\mathrm{u}}^{2}(n) \\ &= \lambda \widehat{\sigma}_{e_{\mathrm{u}}}^{2}(n-1) + (1-\lambda) \left[e(n) - v(n)\right]^{2}, \end{array} $$
((88))

where λ is a weighting factor [ λ=1−1/(K L), with K>1]. Of course, in practice, the near-end signal v(n) is not available; however, for comparison purpose, we consider that it is available in (88).

In the first simulation, we evaluate the performance of the NLMS algorithm using α opt(n) and δ opt(n), respectively. Since the estimation from (88) is used for both these parameters, we deal with the ideal behavior of the algorithms. Consequently, we will refer to these algorithms as the ideal optimal step-size NLMS (OSS-NLMS-id) and the ideal optimal regularized NLMS (OR-NLMS-id), respectively. In Fig. 3, these ideal benchmarks are compared to the NLMS algorithm using different constant values of the normalized step-size, α, and regularization parameter, δ; the input signal is a white Gaussian noise. First, it can be noticed that the performance of the regular NLMS algorithm can be controlled in terms of both parameters, α and δ, either by setting the fastest convergence mode (i.e., α=1) and adjusting the value of δ, or by neglecting the regularization constant (i.e., δ=0) and tuning the value of α. On the other hand, in case of the optimal control parameters, the OSS-NLMS-id and OR-NLMS-id algorithms achieve both fast convergence/tracking and low misalignment, outperforming the NLMS algorithms that use constant values for α and δ. Besides, it should be noted that the OSS-NLMS-id and OR-NLMS-id algorithms are equivalent in terms of their performance (their misalignment curves are overlapped), which justifies the findings from Section 1.2. For this experiment, the evolution of α opt(n) and δ opt(n) is depicted in Fig. 4, also supporting the expected behavior of these parameters.

Fig. 3
figure 3

Performance of the optimal algorithms for white Gaussian input. Misalignment of the NLMS (for different values of α and δ), OSS-NLMS-id, and OR-NLMS-id algorithms. The input signal is white Gaussian, echo path changes at time 10 s, L=512, and SNR = 20 dB

Fig. 4
figure 4

Optimal control parameters for white Gaussian input. Evolution of the optimal control parameters: a α opt(n) of the OSS-NLMS-id algorithm and b δ opt(n) of the OR-NLMS-id algorithm. Other conditions are the same as in Fig. 3

Next, the same experiment is repeated using an AR(1) process as input; the results are presented in Figs. 5 and 6. As expected, the convergence rate of the algorithms is affected in this case, due to correlated inputs. Also, as we can notice from Fig. 5, the OSS-NLMS-id and OR-NLMS-id algorithms (which behave the same) still outperform their classical counterparts. Based on the evolution of α opt(n) and δ opt(n) depicted in Fig. 6, we can outline again the discussion from the end of Section 1.2, related to the dynamic range of these parameters. In practice, it is usually more convenient to control the performance of the algorithm in terms of the normalized step-size, since its values are limited in a specific interval. On the other hand, it could be more difficult to control the adaptation in terms of the regularization term, since its values are increasing and could lead to overflows. Usually, an upper bound on the regularization parameter could be imposed, but this would introduce an extra tuning parameter in the algorithm. Due to these aspects, only the OSS-NLMS-id algorithm will be considered as a benchmark in the following experiments.

Fig. 5
figure 5

Performance of the optimal algorithms for AR(1) input. Misalignment of the NLMS (for different values of α and δ), OSS-NLMS-id, and OR-NLMS-id algorithms. The input signal is an AR(1) process, echo path changes at time 40 s, L=512, and SNR = 20 dB

Fig. 6
figure 6

Optimal control parameters for AR(1) input. Evolution of the optimal control parameters: a α opt(n) of the OSS-NLMS-id algorithm and b δ opt(n) of the OR-NLMS-id algorithm. Other conditions are the same as in Fig. 5

Nevertheless, the OSS-NLMS-id algorithm still requires a constant regularization parameter, especially in case of non-stationary inputs like speech. This is also the case of the NPVSS-NLMS algorithm presented in Section 1.3. While in the previous experiments, this regularization constant was neglected (due to the stationary nature of the input signals), and the next simulation shows the importance of this parameter in practice. For this purpose, in Fig. 7, a speech sequence is considered at the far-end. The NLMS, NPVSS-NLMS, and OSS-NLMS-id algorithms are compared when using two different values of the regularization constant, i.e., \(\delta = {\sigma _{x}^{2}}\) and \(\delta = 20{\sigma _{x}^{2}}\). As expected, the small regularization is not suitable in this case, leading to large misalignment. On the other hand, the rule of thumb \(\delta = 20{\sigma _{x}^{2}}\) (used in many echo cancellation scenarios [68]) is more appropriate here. Thus, the regularization parameter is a must in this case. In fact, the regularization parameter is required in all ill-posed and ill-conditioned problems such as in AEC; some insights for choosing this parameter in practice can be found in [14]. However, in all the following experiments, we will consider a constant regularization \(\delta = 20{\sigma _{x}^{2}}\) for the NLMS, NPVSS-NLMS, and OSS-NLMS-id algorithms. As shown in [14], the regularization parameter of the NLMS algorithm is related to the value of SNR and the filter’s length L. For our experimental setup, i.e., L=512 and SNR = 20 dB, the value \(\delta = 20{\sigma _{x}^{2}}\) fits well. However, this value should be increased for larger values of L or lower SNRs [14]. To conclude this experiment, the influence of the regularization parameter can be also noticed in Fig. 8, where the control parameters of the NPVSS-NLMS and OSS-NLMS-id algorithms are depicted, i.e., α NPVSS(n) and α opt(n), respectively. Clearly, their behavior is strongly biased in case of the small regularization parameter, while they perform similarly in case of a proper regularization.

Fig. 7
figure 7

Regularization influence on the algorithms’ performance. Misalignment of the NLMS (with α=1), NPVSS-NLMS, and OSS-NLMS-id algorithms for different values of δ. The input signal is speech, L=512, and SNR = 20 dB

Fig. 8
figure 8

Regularization influence on the control parameters. Evolution of the normalized step-sizes of the NPVSS-NLMS algorithm [ α NPVSS(n)] and OSS-NLMS-id algorithm [ α opt(n)] for different values of δ: a \(\delta = {\sigma _{x}^{2}}\) and b \(\delta = 20{\sigma _{x}^{2}}\). Other conditions are the same as in Fig. 7

Next, the JO-NLMS algorithm (presented in Section 1.4) is also involved in the rest of experiments. As compared to its counterparts, this algorithm does not require an explicit regularization term. Its global step-size from (83) resulted based on the joint-optimization on both the normalized step-size and regularization parameter. In Figs. 9 and 10, the NLMS algorithm (for different values of α) is compared with the NPVSS-NLMS, JO-NLMS, and OSS-NLMS-id algorithms, when the far-end signal is an AR(1) process or a speech sequence, respectively. According to these results, it can be noticed that the NLMS algorithm is clearly outperformed by the other algorithms, in terms of convergence rate, tracking, and misalignment. Also, the NPVSS-NLMS and JO-NLMS algorithms perform in a similar manner (with a slight advantage for the JO-NLMS algorithm); besides, they are close to the performance of the OSS-NLMS-id algorithm, which represents the ideal benchmark.

Fig. 9
figure 9

Performance of the algorithms for AR(1) input. Misalignment of the NLMS (for different values of α), NPVSS-NLMS, JO-NLMS, and OSS-NLMS-id algorithms. The input signal is an AR(1) process; echo path changes at time 20 s, L=512, and SNR = 20 dB

Fig. 10
figure 10

Performance of the algorithms for speech input. Misalignment of the NLMS (for different values of α), NPVSS-NLMS, JO-NLMS, and OSS-NLMS-id algorithms. The input signal is speech; echo path changes at time 20 s, L=512, and SNR = 20 dB

In all the previous experiments involving the NPVSS-NLMS and JO-NLMS algorithms, it was assumed that the power of the system noise, \({\sigma _{v}^{2}}\), is available. However, in practice, it should be also estimated. Moreover, in AEC, the signal v(n) represents the near-end signal, which can contain both the background noise and the near-end speech; since both these signals could be non-stationary, the estimation of \({\sigma _{v}^{2}}\) becomes more difficult. There are different methods for estimating this parameter; for example, in a single-talk scenario, it can be estimated during silences of the near-end talker [19]. Also, other practical methods to estimate \({\sigma _{v}^{2}}\) can be found in [28, 29], as shown in the end of Section 1.3. In the last experiment, the estimation from (61) is used within the NPVSS-NLMS and JO-NLMS algorithms. Two challenging scenarios are considered in Fig. 11, where the far-end signal is a speech sequence. First, a variation of the background noise is simulated, by decreasing the SNR from 20 to 10 dB between times 10 and 20 s; second, the near-end speech appears between times 25 and 30 s (i.e., double-talk case), without using any DTD. The results from Fig. 11 indicate that the NLMS algorithm fails in this case, especially during double-talk. The NPVSS-NLMS and JO-NLMS algorithms show good robustness features in both situations (with an advantage for the JO-NLMS algorithm during double-talk). In terms of robustness, the JO-NLMS algorithm performs similar to the ideal case represented by the OSS-NLMS-id algorithm. Finally, it should be noted that both the NPVSS-NLMS and JO-NLMS algorithms do not require any additional features to control their behavior, thus being reliable candidates for AEC applications.

Fig. 11
figure 11

Performance of the algorithms during near-end variations. Misalignment of the NLMS (for different values of α), NPVSS-NLMS, JO-NLMS, and OSS-NLMS-id algorithms. The NPVSS-NLMS and JO-NLMS algorithms use the estimated \(\widehat {\sigma }_{v}^{2}(n)\) from (61). The input signal is speech, L=512, and SNR = 20 dB. The SNR decreases from 20 to 10 dB between times 10 and 20 s and the near-end speech appears between times 25 and 30 s (without using a DTD)

2 Conclusions

In this paper, we have presented several NLMS-based algorithms suitable for AEC applications. These algorithms are based on different control strategies for adjusting their main parameters, i.e., the normalized step-size and regularization term, in order to achieve a proper compromise between the performance criteria (i.e., fast convergence/tracking and low misadjustment). The main motivation behind this approach was the reference work of Hänsler and Schmidt from [1]. Following their ideas, we presented here two related solutions, i.e., the NPVSS-NLMS and JO-NLMS algorithms. The first one (originally proposed in [19]) represents a simple and efficient method to control the normalized step-size. Due to its non-parametric nature, it is a reliable choice in many practical applications. The second one is developed in the context of a state-variable model and follows an optimization criterion based on the minimization of the system misalignment. It is also a non-parametric algorithm, which does not require any additional control features (e.g., system change detector, stability thresholds, etc.). It also gives good robustness against double-talk, which is one of the most challenging situation in AEC. Consequently, it could be an appealing candidate for real-world applications.

There are several perspectives that could follow the ideas presented in this paper. First, the extension to the affine projection algorithm represents a straightforward approach. Second, it would be highly interesting to further develop these solutions in the context of proportionate-type algorithms, which are also attractive choices for sparse system identification.

Concluding, despite the fact that the NLMS algorithm was the workhorse in AEC and also in many other applications, it is still highly studied and very often represents the algorithm of choice in practice. Therefore, let us end this paper with a neat remark of Hänsler and Schmidt from [4], which fits best in this context: “The NLMS algorithm has often been declared to be dead. According to a popular saying, this is an infallible sign of a very long life.”

References

  1. E Hänsler, G Schmidt, Acoustic Echo and Noise Control—A Practical Approach (Wiley, Hoboken, NJ, 2004).

    Book  Google Scholar 

  2. E Hänsler, The hands-free telephone problem—an annotated bibliography. Signal Process. 27(3), 259–271 (1992).

    Article  Google Scholar 

  3. E Hänsler, G Schmidt, Hands-free telephones—joint control of echo cancellation and post filtering. Signal Process. 80(11), 2295–2305 (2000).

    Article  MATH  Google Scholar 

  4. E Hänsler, G Schmidt, in Least-Mean-Square Adaptive Filters, ed. by S Haykin, B Widrow. Control of LMS-type adaptive filters (WileyNew York, NY, 2003), pp. 175–240.

    Google Scholar 

  5. E Hänsler, in Encyclopedia of Telecommunications, ed. by J Proakis. Acoustic echo cancellation (WileyNew York, NY, 2003), pp. 1–15.

    Google Scholar 

  6. SL Gay, J Benesty (eds.), Acoustic Signal Processing for Telecommunication (Kluwer Academic Publisher, Boston, MA, 2000).

  7. J Benesty, T Gänsler, DR Morgan, MM Sondhi, SL Gay, Advances in Network and Acoustic Echo Cancellation (Springer-Verlag, Berlin, Germany, 2001).

    Book  MATH  Google Scholar 

  8. J Benesty, Y Huang (eds.), Adaptive Signal Processing—Applications to Real-World Problems (Springer-Verlag, Berlin, Germany, 2003).

  9. G Enzner, H Buchner, A Favrot, F Kuech, in Academic Press, Library in Signal Processing, 4, ed. by R Chellappa, S Theodoridis. Acoustic echo control (Academic PressChennai, 2014), pp. 807–877.

    Google Scholar 

  10. B Widrow, SD Stearns, Adaptive Signal Processing (Prentice Hall, Englewood Cliffs, NJ, 1985).

    MATH  Google Scholar 

  11. S Haykin, Adaptive Filter Theory, 4th edn. (Upper Saddle River, NJ, Prentice-Hall, 2002).

    MATH  Google Scholar 

  12. AH Sayed, Adaptive Filters (Wiley, New York, NY, 2008).

    Book  Google Scholar 

  13. C Breining, P Dreiseitel, E Hänsler, A Mader, B Nitsch, H Puder, T Schertler, G Schmidt, J Tilp, Acoustic echo control—an application of very-high-order adaptive filters. IEEE Signal Processing Mag. 16(4), 42–69 (1999).

    Article  Google Scholar 

  14. J Benesty, C Paleologu, S Ciochină, On regularization in adaptive filtering. IEEE Trans. Audio, Speech, Language Processing. 19(6), 1734–1742 (2011).

    Article  MATH  Google Scholar 

  15. A Mader, H Puder, GU Schmidt, Step-size control for acoustic echo cancellation filters—an overview. Signal Process. 80(9), 1697–1719 (2000).

    Article  MATH  Google Scholar 

  16. AI Sulyman, A Zerguine, Convergence and steady-state analysis of a variable step-size NLMS algorithm. Signal Process. 83(6), 1255–1273 (2003).

    Article  MATH  Google Scholar 

  17. H-C Shin, AH Sayed, W-J Song, Variable step-size NLMS and affine projection algorithms. IEEE Signal Processing Lett. 11(2), 132–135 (2004).

    Article  Google Scholar 

  18. DP Mandic, A generalized normalized gradient descent algorithm. IEEE Signal Processing Lett. 11(2), 115–118 (2004).

    Article  Google Scholar 

  19. J Benesty, H Rey, L Rey Vega, S Tressens, A nonparametric VSS-NLMS algorithm. IEEE Signal Processing Lett. 13(10), 581–584 (2006).

    Article  Google Scholar 

  20. Y-S Choi, H-C Shin, W-J Song, Robust regularization for normalized LMS algorithms. IEEE Trans. Circuits and Systems II: Express Briefs. 53(8), 627–631 (2006).

    Article  Google Scholar 

  21. H Rey, L Rey Vega, S Tressens, J Benesty, Variable explicit regularization in affine projection algorithm: robustness issues and optimal choice. IEEE Trans. Signal Process. 55(5), 2096–2108 (2007).

    Article  MathSciNet  Google Scholar 

  22. P Park, M Chang, N Kong, Scheduled-stepsize NLMS algorithm. IEEE Signal Processing Lett. 16(12), 1055–1058 (2009).

    Article  Google Scholar 

  23. H-C Huang, J Lee, A new variable step-size NLMS algorithm and its performance analysis. IEEE Trans. Signal Process. 60(4), 2055–2060 (2012).

    Article  MathSciNet  Google Scholar 

  24. H-C Huang, J Lee, in Proc. IEEE Asilomar. A variable regularization control method for NLMS algorithm (Pacific GroveCA, 2012), pp. 396–400.

    Google Scholar 

  25. I Song, P Park, A normalized least-mean-square algorithm based on variable-step-size recursion with innovative input data. IEEE Signal Processing Lett. 19(12), 817–820 (2012).

    Article  Google Scholar 

  26. S Ciochină, C Paleologu, J Benesty, An optimized NLMS algorithm for system identification. Signal Process. 118(1), 115–121 (2016).

    Article  Google Scholar 

  27. DR Morgan, SG Kratzer, On a class of computationally efficient, rapidly converging, generalized NLMS algorithms. IEEE Signal Processing Lett. 3(8), 245–247 (1996).

    Article  Google Scholar 

  28. MA Iqbal, SL Grant, in Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Novel variable step size NLMS algorithms for echo cancellation (Las Vegas, NV, 2008), pp. 241–244.

  29. C Paleologu, S Ciochină, J Benesty, Variable step-size NLMS algorithm for under-modeling acoustic echo cancellation. IEEE Signal Processing Lett. 15:, 5–8 (2008).

    Article  Google Scholar 

  30. G Enzner, P Vary, Frequency-domain adaptive Kalman filter for acoustic echo control in hands-free telephones. Signal Process. 86(6), 1140–1156 (2006).

    Article  MATH  Google Scholar 

  31. C Paleologu, J Benesty, S Ciochină, Study of the general Kalman filter for echo cancellation. IEEE Trans. Audio, Speech, Language Processing. 21(8), 1539–1549 (2013).

    Article  Google Scholar 

  32. L Isserlis, On a formula for the product-moment coefficient of any order of a normal frequency distribution in any number of variables. Biometrika. 12(1/2), 134–139 (1918).

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported by the UEFISCDI Romania under Grant PN-II-RU-TE-2014-4-1880.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Constantin Paleologu.

Additional information

Competing interests

The authors declare that they have no competing interests.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Paleologu, C., Ciochină, S., Benesty, J. et al. An overview on optimized NLMS algorithms for acoustic echo cancellation. EURASIP J. Adv. Signal Process. 2015, 97 (2015). https://doi.org/10.1186/s13634-015-0283-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/s13634-015-0283-1

Keywords