layout: doc-page title: Distributed Cholesky QR


Intro

Mahout has a distributed implementation of QR decomposition for tall thin matrices[1].

Algorithm

For the classic QR decomposition of the form $$ \mathbf{A}=\mathbf{QR},\mathbf{A}\in\mathbb{R}^{m\times n} $$ a distributed version is fairly easily achieved if $$ \mathbf{A} $$ is tall and thin such that $$ \mathbf{A}^{\top}\mathbf{A} $$ fits in memory, i.e. m is large but n < ~5000 Under such circumstances, only $$ \mathbf{A} $$ and $$ \mathbf{Q} $$ are distributed matrices and $$ \mathbf{A^{\top}A} $$ and $$ \mathbf{R} $$ are in-core products. We just compute the in-core version of the Cholesky decomposition in the form of $$ \mathbf{LL}^{\top}= \mathbf{A}^{\top}\mathbf{A}$$. After that we take $$ \mathbf{R}= \mathbf{L}^{\top} $$ and $$ \mathbf{Q}=\mathbf{A}\left(\mathbf{L}^{\top}\right)^{-1} $$. The latter is easily achieved by multiplying each vertical block of $$ \mathbf{A} $$ by $$ \left(\mathbf{L}^{\top}\right)^{-1} $$. (There is no actual matrix inversion happening).

Implementation

Mahout dqrThin(...) is implemented in the mahout math-scala algebraic optimizer which translates Mahout's R-like linear algebra operators into a physical plan for both Spark and H2O distributed engines.

def dqrThin[K: ClassTag](A: DrmLike[K], checkRankDeficiency: Boolean = true): (DrmLike[K], Matrix) = {        
    if (drmA.ncol > 5000)
        log.warn("A is too fat. A'A must fit in memory and easily broadcasted.")
    implicit val ctx = drmA.context
    val AtA = (drmA.t %*% drmA).checkpoint()
    val inCoreAtA = AtA.collect
    val ch = chol(inCoreAtA)
    val inCoreR = (ch.getL cloned) t
    if (checkRankDeficiency && !ch.isPositiveDefinite)
        throw new IllegalArgumentException("R is rank-deficient.")
    val bcastAtA = sc.broadcast(inCoreAtA)
    val Q = A.mapBlock() {
        case (keys, block) => keys -> chol(bcastAtA).solveRight(block)
    }
    Q -> inCoreR
}

Usage

The scala dqrThin(...) method can easily be called in any Spark or H2O application built with the math-scala library and the corresponding Spark or H2O engine module as follows:

import org.apache.mahout.math._
import decompositions._
import drm._

val(drmQ, inCoreR) = dqrThin(drma)

References

[1]: Mahout Scala and Mahout Spark Bindings for Linear Algebra Subroutines

[2]: Mahout Spark and Scala Bindings