doc/design/modules/svd.tex - madlib - Git at Google

 % When using TeXShop on the Mac, let it know the root document. The following must be one of the first 20 lines.
 % !TEX root = ../design.tex

 \chapter[Singular Value Decomposition]{Singular Value Decomposition}

 \begin{moduleinfo}
 \item[Author] \href{mailto:riyer@gopivotal.com}{Rahul Iyer}
 \item[History]
     \begin{modulehistory}
         \item[v0.1] Initial version
     \end{modulehistory}
 \end{moduleinfo}

 \newcommand{\vectornorm}[1]{\left|\left|#1\right|\right|}
 \def\Rset{\mathbb{R}}
 % Abstract. What is the problem we want to solve?
 In linear algebra, the singular value decomposition (SVD) is a factorization of
 a real or complex matrix, with many useful applications in signal processing and
 statistics.

 Let $A$ be an $m \times n$ matrix, where $m \ge n$. Then $A$ can be decomposed as follows:
 \begin{gather*}
     A = U \Sigma V^T,
 \end{gather*}
 where $U$ is a $m \times n$ orthonormal matrix, $\Sigma$ is a $n\times n$
 diagonal matrix, and $V$ is an $n\times n$ orthonormal matrix. The diagonal
 elements of $\Sigma$ are called the \emph{singular values}.

 It is possible to formulate the problem of computing the singular triplets
 ($\sigma_i, u_i, v_i$) of $A$ as an eigenvalue problem involving a Hermitian
 matrix related to $A$. There are two possible ways of achieving this:

 \begin{figure}[tb]
     \begin{center}
         \includegraphics[width=\textwidth]{figures/svd_figure}
     \end{center}
     \caption{Singular Value Decomposition}
     \label{fig:svd}
 \end{figure}

 \begin{enumerate}
     \item With the cross product matrix, $A^TA$ and $AA^T$
     \item With the cyclic matrix $H(A) =
         \begin{bmatrix}
             0   & A\\
             A^* & 0
         \end{bmatrix}$
 \end{enumerate}

 The singular values are the nonnegative square roots of the eigenvalues of the
 cross product matrix. This approach may imply a severe loss of accuracy in the
 smallest singular values. The cyclic matrix approach is an alternative that
 avoids this problem, but at the expense of significantly increasing the cost of
 the computation.

 Computing the cross product matrix explicitly is not recommended, especially in
 the case of sparse A. Bidiagonalization was proposed by Golub and
 Kahan~\cite{golub1965} as a way of tridiagonalizing the cross product matrix
 without forming it explicitly.

 Consider the following decomposition
 \begin{gather*}
     A = P B Q^T,
 \end{gather*}

 where P and Q are unitary matrices and B is an $m\times n$ upper bidiagonal
 matrix. Then the tridiagonal matrix $BB^T$ is unitarily similar to $AA^T$.
 Additionally, specific methods exist that compute the singular values of $B$
 without forming $BB^T$. Therefore, after computing the SVD of B,
 \begin{gather*}
     B = X \Sigma Y^T,
 \end{gather*}
 it only remains to compute the SVD of the original matrix with $U = PX$ and $V = QY$.

 \section{Lanczos Bidiagonalization} % (fold)
 \label{sec:lanczos_bidiagonalization}

 The Lanczos algorithm is an iterative algorithm devised by Cornelius Lanczos
 that is an adaptation of power methods to find eigenvalues and eigenvectors of a
 square matrix or the singular value decomposition of a rectangular matrix. It is
 particularly useful for finding decompositions of very large sparse matrices.

 For a rectangular matrix $A$, the Lanczos bidiagonalization method computes a
 sequence of Lanczos vectors $p_j \in \mathbb{R}^m$ and $q_j \in \mathbb{R}^n$
 and scalars $\alpha_j$ and $\beta_j$ for $j = 1, 2, \ldots, k$ as follows:

 \begin{algorithm}[Lanczos Bidiagonalization Algorithm]
  % \label{alg:lanczos-bidiagonalization}
 \begin{algorithmic}[1]
      \State Choose a unit-norm vector $q_1$ and let $\beta_0 = 0$ and $p_0 = 0$
      \For{$j = 1, 2, \ldots, k$}
         \State $p_j \set Aq_j - \beta_{j-1} p_{j-1}$
         \State $\alpha_j \set \vectornorm{p_j}_2$
         \State $p_j \set p_j/\alpha_j$
         \State $q_{j+1} \set A^Tp_j - \alpha_j q_j$
         \State $\beta_j \set \vectornorm{q_{j+1}}_2$
         \State $q_{j+1} \set q_{j+1}/\beta_j$
     \EndFor
 \end{algorithmic}
 \end{algorithm}


 After $k$ steps, we can generate the bidiagonal matrix $B_k$ as follows,
 \[
 \begin{bmatrix}
     \alpha_1    & \beta_1   &           &           &               & \\
                 & \alpha_2  & \beta_2   &           &               & \\
                 &           & \alpha_3  & \beta_3   &               & \\
                 &           &           & \ddots    & \ddots        & \\
                 &           &           &           & \alpha_{k-1}  & \beta_{k-1} \\
                 &           &           &           &               & \alpha_k
 \end{bmatrix}
 \]
 In exact arithmetic the Lanczos vectors are orthonormal such that,
 \begin{gather*}
     P_{k+1} = (p_1, p_2, \dotsc, p_{k+1}) \in \mathbb{R}^{m \times (k+1)},
         P_{k+1}^T P_{k+1} = I_{k+1} \\
     Q_{k+1} = (q_1, q_2, \dotsc, q_{k+1}) \in \mathbb{R}^{n \times  (k+1)},
         Q_{k+1}^T Q_{k+1} = I_{k+1}.
 \end{gather*}

 \begin{figure}[tb]
     \centering
         \includegraphics[width=\textwidth]{figures/lanczos_bidiag_segment}
     \caption{Lanczos bidiagonalization of $A$}
     \label{fig:lanczos}
 \end{figure}
 After $k$ Lanczos steps, the Ritz values $\tilde{\sigma_i}$ (approximate singular
 values of A) are equal to the computed singular values of $B_k$, and the
 Ritz vectors are
 \begin{gather*}
     \tilde{u_i} = P_k x_i,  \qquad  \tilde{v_i} = Q_k y_i
 \end{gather*}
 % subsubsection lanczos_bidiagonalization (end)

 \section{Dealing with Loss of Orthogonality} % (fold)
 \label{sec:dealing_with_loss_of_orthogonality}

 After a sufficient number of steps the Lanczos vectors start to lose their
 mutual orthogonality, and this happens together with the appearance in the
 spectrum of $B_j$ of repeated and spurious Ritz values. The simplest cure for
 this loss of orthogonality is full orthogonalization. In Lanczos
 bidiagonalization, two sets of Lanczos vectors are computed, so full
 orthogonalization amounts to orthogonalizing vector $p_j$ explicitly with
 respect to all the previously computed left Lanczos vectors, and orthogonalizing
 vector $q_{j+1}$ explicitly with respect to all the previously computed right
 Lanczos vectors.

 \begin{algorithm}[Lanczos Bidiagonalization with Partial Reorthogonalization]
 % \label{alg:one-sided_lanczos}
 \begin{algorithmic}[1]
      \State Choose a unit-norm vector $q_1$ and let $\beta_0 = 0$ and $p_0 = 0$
      \For{$j = 1, 2, \ldots, k$}
         \State $p_j \set Aq_j - \beta_{j-1} p_{j-1}$
         \State $\alpha_j \set \vectornorm{p_j}_2$
         \State $p_j \set p_j/\alpha_j$
         \State $q_{j+1} \set A^Tp_j - \alpha_j q_j$
         \State Orthogonalize $q_{j+1}$ with respect to $Q_j$
         \State $\beta_j \set \vectornorm{q_{j+1}}_2$
         \State $q_{j+1} \set q_{j+1}/\beta_j$
     \EndFor
 \end{algorithmic}
 \end{algorithm}

 There is a variation of this orthogonalization that maintains the effectiveness
 of full reorthogonalization but with a considerably reduced cost. This technique
 was proposed by Simon and Zha~\cite{simon2000}. The idea comes from the observation that,
 in the Lanczos bidiagonalization procedure without reorthogonalization, the
 level of orthogonality of left and right Lanczos vectors go hand in hand. This
 observation led to Simon and Zha to propose what they called the one-sided version
 shown in Algorithm~2. %\ref{alg:one-sided_lanczos}.
 % section dealing_with_loss_of_orthogonality (end)

 \section{Enhancements for Distributed Efficiency} % (fold)
 \label{sec:parallelizing}

 \begin{algorithm}[Distributed version of Lanczos BPRO]% \label{alg:one-sided_lanczos}
 \begin{algorithmic}[1]
      \State Choose a unit-norm vector $q_1$ and let $\beta_0 = 0$ and $p_0 = 0$
      \For{$j = 1, 2, \ldots, k$}
         \Statex {\footnotesize Transition step}
         \State $p_j \set Aq_j - \beta_{j-1} p_{j-1}$
         \State $\alpha_j \set \vectornorm{p_j}_2^2$   \Comment Delayed normalization
         \State $q_{j+1} \set A^Tp_j - \alpha_j q_j$
         \Statex {\footnotesize Merge step}
         \State Concatenate $p_j$ across parallel segments
         \State Sum $q_{j+1}$ across parallel segments
         \Statex {\footnotesize Final Step}
         \State $\alpha_j \set \sqrt{\alpha_j}$
         \State $p_j \set p_j/\alpha_j$
         \State $q_j \set q_j/\alpha_j$
         \State Orthogonalize $q_{j+1}$ with respect to $Q_j$
         \State $\beta_j \set \vectornorm{q_{j+1}}_2$
         \State $q_{j+1} \set q_{j+1}/\beta_j$
     \EndFor
 \end{algorithmic}
 \end{algorithm}

 % section parallelizing (end)
	% When using TeXShop on the Mac, let it know the root document. The following must be one of the first 20 lines.
	% !TEX root = ../design.tex

	\chapter[Singular Value Decomposition]{Singular Value Decomposition}

	\begin{moduleinfo}
	\item[Author] \href{mailto:riyer@gopivotal.com}{Rahul Iyer}
	\item[History]
	\begin{modulehistory}
	\item[v0.1] Initial version
	\end{modulehistory}
	\end{moduleinfo}

	\newcommand{\vectornorm}[1]{\left\|\left\|#1\right\|\right\|}
	\def\Rset{\mathbb{R}}
	% Abstract. What is the problem we want to solve?
	In linear algebra, the singular value decomposition (SVD) is a factorization of
	a real or complex matrix, with many useful applications in signal processing and
	statistics.

	Let $A$ be an $m \times n$ matrix, where $m \ge n$. Then $A$ can be decomposed as follows:
	\begin{gather*}
	A = U \Sigma V^T,
	\end{gather*}
	where $U$ is a $m \times n$ orthonormal matrix, $\Sigma$ is a $n\times n$
	diagonal matrix, and $V$ is an $n\times n$ orthonormal matrix. The diagonal
	elements of $\Sigma$ are called the \emph{singular values}.

	It is possible to formulate the problem of computing the singular triplets
	($\sigma_i, u_i, v_i$) of $A$ as an eigenvalue problem involving a Hermitian
	matrix related to $A$. There are two possible ways of achieving this:

	\begin{figure}[tb]
	\begin{center}
	\includegraphics[width=\textwidth]{figures/svd_figure}
	\end{center}
	\caption{Singular Value Decomposition}
	\label{fig:svd}
	\end{figure}

	\begin{enumerate}
	\item With the cross product matrix, $A^TA$ and $AA^T$
	\item With the cyclic matrix $H(A) =
	\begin{bmatrix}
	0 & A\\
	A^* & 0
	\end{bmatrix}$
	\end{enumerate}

	The singular values are the nonnegative square roots of the eigenvalues of the
	cross product matrix. This approach may imply a severe loss of accuracy in the
	smallest singular values. The cyclic matrix approach is an alternative that
	avoids this problem, but at the expense of significantly increasing the cost of
	the computation.

	Computing the cross product matrix explicitly is not recommended, especially in
	the case of sparse A. Bidiagonalization was proposed by Golub and
	Kahan~\cite{golub1965} as a way of tridiagonalizing the cross product matrix
	without forming it explicitly.

	Consider the following decomposition
	\begin{gather*}
	A = P B Q^T,
	\end{gather*}

	where P and Q are unitary matrices and B is an $m\times n$ upper bidiagonal
	matrix. Then the tridiagonal matrix $BB^T$ is unitarily similar to $AA^T$.
	Additionally, specific methods exist that compute the singular values of $B$
	without forming $BB^T$. Therefore, after computing the SVD of B,
	\begin{gather*}
	B = X \Sigma Y^T,
	\end{gather*}
	it only remains to compute the SVD of the original matrix with $U = PX$ and $V = QY$.

	\section{Lanczos Bidiagonalization} % (fold)
	\label{sec:lanczos_bidiagonalization}

	The Lanczos algorithm is an iterative algorithm devised by Cornelius Lanczos
	that is an adaptation of power methods to find eigenvalues and eigenvectors of a
	square matrix or the singular value decomposition of a rectangular matrix. It is
	particularly useful for finding decompositions of very large sparse matrices.

	For a rectangular matrix $A$, the Lanczos bidiagonalization method computes a
	sequence of Lanczos vectors $p_j \in \mathbb{R}^m$ and $q_j \in \mathbb{R}^n$
	and scalars $\alpha_j$ and $\beta_j$ for $j = 1, 2, \ldots, k$ as follows:

	\begin{algorithm}[Lanczos Bidiagonalization Algorithm]
	% \label{alg:lanczos-bidiagonalization}
	\begin{algorithmic}[1]
	\State Choose a unit-norm vector $q_1$ and let $\beta_0 = 0$ and $p_0 = 0$
	\For{$j = 1, 2, \ldots, k$}
	\State $p_j \set Aq_j - \beta_{j-1} p_{j-1}$
	\State $\alpha_j \set \vectornorm{p_j}_2$
	\State $p_j \set p_j/\alpha_j$
	\State $q_{j+1} \set A^Tp_j - \alpha_j q_j$
	\State $\beta_j \set \vectornorm{q_{j+1}}_2$
	\State $q_{j+1} \set q_{j+1}/\beta_j$
	\EndFor
	\end{algorithmic}
	\end{algorithm}


	After $k$ steps, we can generate the bidiagonal matrix $B_k$ as follows,
	\[
	\begin{bmatrix}
	\alpha_1 & \beta_1 & & & & \\
	& \alpha_2 & \beta_2 & & & \\
	& & \alpha_3 & \beta_3 & & \\
	& & & \ddots & \ddots & \\
	& & & & \alpha_{k-1} & \beta_{k-1} \\
	& & & & & \alpha_k
	\end{bmatrix}
	\]
	In exact arithmetic the Lanczos vectors are orthonormal such that,
	\begin{gather*}
	P_{k+1} = (p_1, p_2, \dotsc, p_{k+1}) \in \mathbb{R}^{m \times (k+1)},
	P_{k+1}^T P_{k+1} = I_{k+1} \\
	Q_{k+1} = (q_1, q_2, \dotsc, q_{k+1}) \in \mathbb{R}^{n \times (k+1)},
	Q_{k+1}^T Q_{k+1} = I_{k+1}.
	\end{gather*}

	\begin{figure}[tb]
	\centering
	\includegraphics[width=\textwidth]{figures/lanczos_bidiag_segment}
	\caption{Lanczos bidiagonalization of $A$}
	\label{fig:lanczos}
	\end{figure}
	After $k$ Lanczos steps, the Ritz values $\tilde{\sigma_i}$ (approximate singular
	values of A) are equal to the computed singular values of $B_k$, and the
	Ritz vectors are
	\begin{gather*}
	\tilde{u_i} = P_k x_i, \qquad \tilde{v_i} = Q_k y_i
	\end{gather*}
	% subsubsection lanczos_bidiagonalization (end)

	\section{Dealing with Loss of Orthogonality} % (fold)
	\label{sec:dealing_with_loss_of_orthogonality}

	After a sufficient number of steps the Lanczos vectors start to lose their
	mutual orthogonality, and this happens together with the appearance in the
	spectrum of $B_j$ of repeated and spurious Ritz values. The simplest cure for
	this loss of orthogonality is full orthogonalization. In Lanczos
	bidiagonalization, two sets of Lanczos vectors are computed, so full
	orthogonalization amounts to orthogonalizing vector $p_j$ explicitly with
	respect to all the previously computed left Lanczos vectors, and orthogonalizing
	vector $q_{j+1}$ explicitly with respect to all the previously computed right
	Lanczos vectors.

	\begin{algorithm}[Lanczos Bidiagonalization with Partial Reorthogonalization]
	% \label{alg:one-sided_lanczos}
	\begin{algorithmic}[1]
	\State Choose a unit-norm vector $q_1$ and let $\beta_0 = 0$ and $p_0 = 0$
	\For{$j = 1, 2, \ldots, k$}
	\State $p_j \set Aq_j - \beta_{j-1} p_{j-1}$
	\State $\alpha_j \set \vectornorm{p_j}_2$
	\State $p_j \set p_j/\alpha_j$
	\State $q_{j+1} \set A^Tp_j - \alpha_j q_j$
	\State Orthogonalize $q_{j+1}$ with respect to $Q_j$
	\State $\beta_j \set \vectornorm{q_{j+1}}_2$
	\State $q_{j+1} \set q_{j+1}/\beta_j$
	\EndFor
	\end{algorithmic}
	\end{algorithm}

	There is a variation of this orthogonalization that maintains the effectiveness
	of full reorthogonalization but with a considerably reduced cost. This technique
	was proposed by Simon and Zha~\cite{simon2000}. The idea comes from the observation that,
	in the Lanczos bidiagonalization procedure without reorthogonalization, the
	level of orthogonality of left and right Lanczos vectors go hand in hand. This
	observation led to Simon and Zha to propose what they called the one-sided version
	shown in Algorithm~2. %\ref{alg:one-sided_lanczos}.
	% section dealing_with_loss_of_orthogonality (end)

	\section{Enhancements for Distributed Efficiency} % (fold)
	\label{sec:parallelizing}

	\begin{algorithm}[Distributed version of Lanczos BPRO]% \label{alg:one-sided_lanczos}
	\begin{algorithmic}[1]
	\State Choose a unit-norm vector $q_1$ and let $\beta_0 = 0$ and $p_0 = 0$
	\For{$j = 1, 2, \ldots, k$}
	\Statex {\footnotesize Transition step}
	\State $p_j \set Aq_j - \beta_{j-1} p_{j-1}$
	\State $\alpha_j \set \vectornorm{p_j}_2^2$ \Comment Delayed normalization
	\State $q_{j+1} \set A^Tp_j - \alpha_j q_j$
	\Statex {\footnotesize Merge step}
	\State Concatenate $p_j$ across parallel segments
	\State Sum $q_{j+1}$ across parallel segments
	\Statex {\footnotesize Final Step}
	\State $\alpha_j \set \sqrt{\alpha_j}$
	\State $p_j \set p_j/\alpha_j$
	\State $q_j \set q_j/\alpha_j$
	\State Orthogonalize $q_{j+1}$ with respect to $Q_j$
	\State $\beta_j \set \vectornorm{q_{j+1}}_2$
	\State $q_{j+1} \set q_{j+1}/\beta_j$
	\EndFor
	\end{algorithmic}
	\end{algorithm}

	% section parallelizing (end)