The crux of block-coordinate descent is to iterate minimizations of the function of interest over a group of variables, while keeping the rest fixed. Consider the objective function

$J\left(\mathsf{\text{d}}\right):=\frac{1}{2}{\u2225\mathsf{\text{y}}-\mathsf{\text{Xd}}\u2225}_{2}^{2}+\lambda \sum _{n=1}^{N}{\u2225{\mathsf{\text{d}}}_{n}\u2225}_{2}$

(13)

and let

${\mathsf{\text{d}}}^{\left(i-1\right)}:={\left[{\mathsf{\text{d}}}_{0}^{{\left(i-1\right)}^{T}},{\mathsf{\text{d}}}_{1}^{{\left(i-1\right)}^{T}},\dots ,{\mathsf{\text{d}}}_{N}^{{\left(i-1\right)}^{T}}\right]}^{T}$ denotes the provisional solution at iteration

*i-* 1. The

*n* th step of the

*i* th block-coordinate descent iteration entails minimization of

*J*(

*d*)

*only* with respect to d

_{
n
}, while retaining the provisional estimates at iteration

*i*-1, namely

${\left\{{\mathsf{\text{d}}}_{{n}^{\prime}}^{\left(i-1\right)}\right\}}_{{n}^{\prime}=n+1}^{N}$, and the newly updated blocks at iterations

*i*, namely

${\left\{{\mathsf{\text{d}}}_{{n}^{\prime}}^{\left(i\right)}\right\}}_{{n}^{\prime}=0}^{n-1}$. Thus, block-coordinate descent at the

*n* th step of the

*i* th iteration yields

${\mathsf{\text{d}}}_{n}^{\left(i\right)}=\mathsf{\text{arg}}\underset{{\mathsf{\text{d}}}_{n}}{\mathsf{\text{min}}}J\left(\left[{\mathsf{\text{d}}}_{0}^{\left(i\right)},\dots ,{\mathsf{\text{d}}}_{n-1}^{\left(i\right)},{\mathsf{\text{d}}}_{n},{\mathsf{\text{d}}}_{n+1}^{\left(i-1\right)},\dots ,{\mathsf{\text{d}}}_{N}^{\left(i-1\right)}\right]\right)$

(14)

for

*n* = 0,1,...,

*N*, and

*i >* 0. Skipping constant terms,

*J*(d) in (13) can be rewritten as

$J\left(\mathsf{\text{d}}\right)=\frac{1}{2}{\mathsf{\text{d}}}^{T}{\mathsf{\text{X}}}^{T}\mathsf{\text{Xd}}-{\mathsf{\text{d}}}^{T}{\mathsf{\text{X}}}^{T}\mathsf{\text{y}}+\lambda \sum _{n=1}^{N}{\u2225{\mathsf{\text{d}}}_{n}\u2225}_{2}=\frac{1}{2}{\mathsf{\text{d}}}^{T}\mathsf{\text{Rd}}-{\mathsf{\text{d}}}^{T}\mathsf{\text{r}}+\lambda \sum _{n=1}^{N}{\u2225{\mathsf{\text{d}}}_{n}\u2225}_{2}$

(15)

where R := X

^{
T
}X, and r := X

^{
T
}y. Upon defining

${\mathsf{\text{R}}}_{n:{n}^{\prime}}:={\sum}_{m=n}^{{n}^{\prime}}{\mathsf{\text{h}}}_{m}{\mathsf{\text{h}}}_{m}^{T}$ and

${\mathsf{\text{r}}}_{n:{n}^{\prime}}:={\sum}_{m=n}^{n\prime}{\mathsf{\text{h}}}_{m}{y}_{m}$ for

*n'* ≥

*n* , it holds that

$\mathsf{\text{R}}=\left[\begin{array}{ccccc}\hfill {\mathsf{\text{R}}}_{0:N}\hfill & \hfill {\mathsf{\text{R}}}_{1:N}\hfill & \hfill \cdots \phantom{\rule{0.3em}{0ex}}\hfill & \hfill {\mathsf{\text{R}}}_{N-1:N}\hfill & \hfill {\mathsf{\text{R}}}_{N:N}\hfill \\ \hfill {\mathsf{\text{R}}}_{1:N}\hfill & \hfill {\mathsf{\text{R}}}_{1:N}\hfill & \hfill \cdots \phantom{\rule{0.3em}{0ex}}\hfill & \hfill {\mathsf{\text{R}}}_{N-1:N}\hfill & \hfill {\mathsf{\text{R}}}_{N:N}\hfill \\ \hfill \vdots \hfill & \hfill \vdots \hfill & \hfill \ddots \hfill & \hfill \vdots \hfill & \hfill \vdots \hfill \\ \hfill {\mathsf{\text{R}}}_{N-1:N}\hfill & \hfill {\mathsf{\text{R}}}_{N-1:N}\hfill & \hfill \cdots \phantom{\rule{0.3em}{0ex}}\hfill & \hfill {\mathsf{\text{R}}}_{N-1:N}\hfill & \hfill {\mathsf{\text{R}}}_{N:N}\hfill \\ \hfill {\mathsf{\text{R}}}_{N:N}\hfill & \hfill {\mathsf{\text{R}}}_{N:N}\hfill & \hfill \cdots \phantom{\rule{0.3em}{0ex}}\hfill & \hfill {\mathsf{\text{R}}}_{N:N}\hfill & \hfill {\mathsf{\text{R}}}_{N:N}\hfill \end{array}\right]$

(16)

and

$\mathsf{\text{r}}=\left[\begin{array}{c}\hfill {\mathsf{\text{r}}}_{\mathsf{\text{0}}:N}\hfill \\ \hfill {\mathsf{\text{r}}}_{\mathsf{\text{1}}:N}\hfill \\ \hfill \vdots \hfill \\ \hfill {\mathsf{\text{r}}}_{N-1:N}\hfill \\ \hfill {\mathsf{\text{r}}}_{N:N}\hfill \end{array}\right].$

(17)

While for

*n* = 0 (14) reduces to an LS problem, for

*n >* 0, omitting again irrelevant terms, it can be rewritten as

${\mathsf{\text{d}}}_{n}^{\left(i\right)}=\mathsf{\text{arg}}\underset{{\mathsf{\text{d}}}_{n}\in {\mathbb{R}}^{L}}{\mathsf{\text{min}}}\left[\frac{1}{2}{\mathsf{\text{d}}}_{n}^{T}{\mathsf{\text{R}}}_{n:N}{\mathsf{\text{d}}}_{n}+{\mathsf{\text{d}}}_{n}^{T}{\mathsf{\text{g}}}_{n}^{\left(i\right)}+\lambda {\u2225{\mathsf{\text{d}}}_{n}\u2225}_{2}\right]$

(18)

with

${\mathsf{\text{g}}}_{n}^{\left(i\right)}:={\mathsf{\text{R}}}_{n:N}\left(\sum _{{n}^{\prime}=0}^{n-1}{\mathsf{\text{d}}}_{{n}^{\prime}}^{\left(i\right)}\right)+\sum _{n\prime =n+1}^{N}{\mathsf{\text{R}}}_{n\prime :N}{\mathsf{\text{d}}}_{n\prime}^{\left(i=1\right)}-{\mathsf{\text{r}}}_{n:N}.$

(19)

The problem in (18) is a convex second-order cone program (SOCP). Typically,

*L* ≪

*N* and (18) can be solved with fast optimization solvers based on interior point methods [

24], at worst-case complexity

$\mathcal{O}\left({L}^{3.5}\right)$. Recently, it has been shown that the solution of (18) can be obtained as a function of the solution of the following

*scalar* problem

${\gamma}_{n}^{(i)}:=\mathrm{arg}\underset{\gamma \ge 0}{\mathrm{min}}\left[\gamma \left(1-\frac{1}{2}{\text{g}}_{n}^{(i)}T{\left(\gamma {\text{R}}_{n:N}+\frac{{\lambda}^{2}}{2}{\text{I}}_{L}\right)}^{-1}{\text{g}}_{n}^{(i)}\right)\right]$

(20)

whose solution is given by [

25]

${\gamma}_{n}^{\left(i\right)}=\left\{\begin{array}{cc}0,\hfill & \hfill \mathsf{\text{if}}\phantom{\rule{2.77695pt}{0ex}}{\u2225{\mathsf{\text{g}}}_{n}^{\left(i\right)}\u2225}_{2}\le \lambda \hfill \\ \hfill \gamma >0:\phantom{\rule{2.77695pt}{0ex}}\phantom{\rule{2.77695pt}{0ex}}{\u2225\frac{\lambda}{2}{\left(\gamma {\mathsf{\text{R}}}_{n:N}+\frac{{\lambda}^{2}}{2}{I}_{L}\right)}^{-1}{\mathsf{\text{g}}}_{n}^{\left(i\right)}\u2225}_{2}^{2}=1,\hfill & \hfill \mathsf{\text{otherwise}}\mathsf{\text{.}}\hfill \end{array}\right.$

(21)

Finally,

${\mathsf{\text{d}}}_{n}^{\left(i\right)}$ in (18) can be obtained from

${\gamma}_{n}^{\left(i\right)}$ in (21) as

${\mathsf{\text{d}}}_{n}^{\left(i\right)}=\left\{\begin{array}{cc}{0}_{L},\hfill & \hfill \mathsf{\text{if}}\phantom{\rule{2.77695pt}{0ex}}{\u2225{\mathsf{\text{g}}}_{n}^{\left(i\right)}\u2225}_{2}\le \lambda \hfill \\ \hfill -{\gamma}_{n}^{\left(i\right)}\phantom{\rule{2.77695pt}{0ex}}\phantom{\rule{2.77695pt}{0ex}}{\left({\gamma}_{n}^{\left(i\right)}{\mathsf{\text{R}}}_{n:N}+\frac{{\lambda}^{2}}{2}{I}_{L}\right)}^{-1}{\mathsf{\text{g}}}_{n}^{\left(i\right)},\hfill & \hfill \mathsf{\text{otherwise}}\mathsf{\text{.}}\hfill \end{array}\right.$

(22)

Notice that if ${\u2225{\mathsf{\text{g}}}_{n}^{\left(i\right)}\u2225}_{2}\le \lambda $, the solution of (18) is ${\mathsf{\text{d}}}_{n}^{\left(i\right)}={0}_{L}$. Since it is expected that the solution of (12) is sparse, solving (18) is trivial most of the time. If ${\u2225{\mathsf{\text{g}}}_{n}^{\left(i\right)}\u2225}_{2}>\lambda ,{\mathsf{\text{d}}}_{n}^{\left(i\right)}$ can be obtained via interior point methods or by (numerically) solving the scalar equation in (21), which admits fast solvers via, e.g., Newton-Raphson iterations, as in [25].

Despite the fact that block-coordinate descent is typically adopted for large-size sparse linear regression, what makes it particularly appealing for catching change-points is the fact that the vector

${\mathsf{\text{g}}}_{n}^{\left(i\right)}$ can be updated recursively in

*n* due to the special structure of

**R** in (16). Upon defining

${\mathsf{\text{c}}}_{n}^{\left(i\right)}:=\sum _{{n}^{\prime}=0}^{n=1}{\mathsf{\text{d}}}_{{n}^{\prime}}^{\left(i\right)}$

(23)

${\mathsf{\text{s}}}_{n}^{\left(i\right)}:=\sum _{n\prime =n+1}^{N}{\mathsf{\text{R}}}_{n\prime :N}{\mathsf{\text{d}}}_{{n}^{\prime}}^{\left(i-1\right)}$

(24)

it follows from (19) that

${\mathsf{\text{g}}}_{n}^{\left(i\right)}={\mathsf{\text{R}}}_{n:N}{\mathsf{\text{c}}}_{n}^{\left(i\right)}+{\mathsf{\text{s}}}_{n}^{\left(i\right)}-{\mathsf{\text{r}}}_{n:N}$

(25)

which shows that evaluating

${\mathsf{\text{g}}}_{n}^{\left(i\right)}$ requires the vectors

${\mathsf{\text{c}}}_{n}^{\left(i\right)}$ and

${\mathsf{\text{s}}}_{n}^{\left(i\right)}$ Given

${\left\{{\mathsf{\text{d}}}_{n}^{\left(i-1\right)}\right\}}_{n=0}^{N}$ from the (

*i* - 1)st iteration, and initializing

${\mathsf{\text{c}}}_{n}^{\left(i\right)}$ and

${\mathsf{\text{s}}}_{n}^{\left(i\right)}$ at

*n* = 0 as

${\mathsf{\text{c}}}_{0}^{\left(i\right)}={0}_{L}$ and

${\mathsf{\text{s}}}_{0}^{\left(i\right)}={\sum}_{n=1}^{N}{\mathsf{\text{R}}}_{n:N}{\mathsf{\text{d}}}_{n}^{\left(i-1\right)}$, it is possible to recursively evaluate

${\mathsf{\text{c}}}_{n}^{\left(i\right)}$ and

${\mathsf{\text{s}}}_{n}^{\left(i\right)}$ given

${\mathsf{\text{c}}}_{n-1}^{\left(i\right)}$,

${\mathsf{\text{s}}}_{n-1}^{\left(i\right)}$ and

${\mathsf{\text{d}}}_{n-1}^{\left(i\right)}$ from step

*n*-1 for

*n >* 0 as

${\mathsf{\text{c}}}_{n}^{\left(i\right)}={\mathsf{\text{c}}}_{n-1}^{\left(i\right)}+{\mathsf{\text{d}}}_{n-1}^{\left(i\right)}$

(26)

${\mathsf{\text{s}}}_{n}^{\left(i\right)}={\mathsf{\text{s}}}_{n-1}^{\left(i\right)}-{\mathsf{\text{R}}}_{n:N}{\mathsf{\text{d}}}_{n}^{\left(i-1\right)}.$

(27)

The block-coordinate descent algorithm is summarized in Algorithm 1. Interestingly, matrix $\mathsf{\text{X}}\in {\mathbb{R}}^{\left(N+1\right)\times \left(N+1\right)L}$ in (12) does not have to be stored since only ${\left\{{\mathsf{\text{R}}}_{n:N}\right\}}_{n=0}^{N}$ and ${\left\{{\mathsf{\text{r}}}_{n:N}\right\}}_{n=0}^{N}$ suffice to implement Algorithm 1. Thus, the memory storage and complexity to perform one block-coordinate descent iteration grow linearly with *N*. This attribute renders the block-coordinate descent appealing especially for large-size problems where DP approaches tend to be too expensive.

Regarding convergence, the ensuing assertion is a direct consequence of the results in [26].

**Proposition 1**. *The iterates* ${\mathsf{\text{d}}}^{\left(i\right)}:={\left[{\mathsf{\text{d}}}_{0}^{{\left(i\right)}^{T}},{\mathsf{\text{d}}}_{1}^{{\left(i\right)}^{T}},\dots ,{\mathsf{\text{d}}}_{N}^{{\left(i\right)}^{T}}\right]}^{T}$ *obtained by* Algorithm 1 *converge to the global minimum of* (12)*; that is*, $\underset{i\to \infty}{\mathsf{\text{lim}}}{\mathsf{\text{d}}}^{\left(i\right)}=\widehat{\mathsf{\text{d}}}$.

Block-coordinate descent will also be the basic building block for solving the non-convex problem introduced in Section 6 to improve the retrieval of change-points. But first, it is useful to consider two issues of the group Lasso change detector for TV-AR models.

Given ${\left\{{\mathsf{\text{R}}}_{n:N},{\mathsf{\text{r}}}_{n:N}\right\}}_{n=0}^{N}$

Initialize with ${\mathsf{\text{d}}}_{n}^{\left(0\right)}={0}_{L}$ for *n* = 1, ... , *N*

**for** *i* > 0 **do**

**for** *n* = 0,1,..., *N* **do**

**if** *n* = 0 **then**

${\mathsf{\text{c}}}_{0}^{\left(i\right)}={0}_{L}$

${\mathsf{\text{s}}}_{0}^{\left(i\right)}={\sum}_{n=1}^{N}{\mathsf{\text{R}}}_{n:N}{\mathsf{\text{d}}}_{n-1}^{\left(i-1\right)}$

${\mathsf{\text{g}}}_{0}^{\left(i\right)}={\mathsf{\text{s}}}_{0}^{\left(i\right)}-{\mathsf{\text{r}}}_{0:N}$

${\mathsf{\text{d}}}_{\mathsf{\text{0}}}^{\left(i\right)}=-{\mathsf{\text{R}}}_{0:N}^{-1}{\mathsf{\text{g}}}_{0}^{\left(i\right)}$

**else**

${\mathsf{\text{c}}}_{n}^{\left(i\right)}={\mathsf{\text{c}}}_{n-1}^{\left(i\right)}+{\mathsf{\text{d}}}_{n-1}^{\left(i\right)}$

${\mathsf{\text{s}}}_{n}^{\left(i\right)}={\mathsf{\text{s}}}_{n-1}^{\left(i\right)}-{\mathsf{\text{R}}}_{n:N}{\mathsf{\text{d}}}_{n}^{\left(i-1\right)}$

${\mathsf{\text{g}}}_{n}^{\left(i\right)}={\mathsf{\text{R}}}_{n:N}{\mathsf{\text{c}}}_{n}^{\left(i\right)}+{\mathsf{\text{s}}}_{n}^{\left(i\right)}-{\mathsf{\text{r}}}_{n:N}$

**if**${\u2225{\mathsf{\text{g}}}_{n}^{\left(i\right)}\u2225}_{2}\le \lambda $**then**

${\mathsf{\text{d}}}_{n}^{\left(i\right)}={0}_{L}$

**else**

${\mathsf{\text{d}}}_{n}^{\left(i\right)}=\mathsf{\text{arg}}\mathsf{\text{mi}}{\mathsf{\text{n}}}_{{\mathsf{\text{d}}}_{n}\in {\mathbb{R}}^{L}}\left[\frac{1}{2}{\mathsf{\text{d}}}_{n}^{T}{\mathsf{\text{R}}}_{n:N}{\mathsf{\text{d}}}_{n}+{\mathsf{\text{d}}}_{n}^{T}{\mathsf{\text{g}}}_{n}^{\left(i\right)}+\lambda {\u2225{\mathsf{\text{d}}}_{n}\u2225}_{2}\right]$

Algorithm 1: **Block-coordinate descent algorithm**