| <!-- |
| Licensed to the Apache Software Foundation (ASF) under one or more |
| contributor license agreements. See the NOTICE file distributed with |
| this work for additional information regarding copyright ownership. |
| The ASF licenses this file to You under the Apache License, Version 2.0 |
| (the "License"); you may not use this file except in compliance with |
| the License. You may obtain a copy of the License at |
| |
| http://www.apache.org/licenses/LICENSE-2.0 |
| |
| Unless required by applicable law or agreed to in writing, software |
| distributed under the License is distributed on an "AS IS" BASIS, |
| WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
| See the License for the specific language governing permissions and |
| limitations under the License. |
| --> |
| --- |
| layout: doc-page |
| title: Distributed Stochastic Singular Value Decomposition |
| |
| |
| --- |
| |
| ## Intro |
| |
| Mahout has a distributed implementation of Stochastic Singular Value Decomposition [1] using the parallelization strategy comprehensively defined in Nathan Halko's dissertation ["Randomized methods for computing low-rank approximations of matrices"](http://amath.colorado.edu/faculty/martinss/Pubs/2012_halko_dissertation.pdf) [2]. |
| |
| ## Modified SSVD Algorithm |
| |
| Given an <foo>\(m\times n\)</foo> |
| matrix <foo>\(\mathbf{A}\)</foo>, a target rank <foo>\(k\in\mathbb{N}_{1}\)</foo> |
| , an oversampling parameter <foo>\(p\in\mathbb{N}_{1}\)</foo>, |
| and the number of additional power iterations <foo>\(q\in\mathbb{N}_{0}\)</foo>, |
| this procedure computes an <foo>\(m\times\left(k+p\right)\)</foo> |
| SVD <foo>\(\mathbf{A\approx U}\boldsymbol{\Sigma}\mathbf{V}^{\top}\)</foo>: |
| |
| 1. Create seed for random <foo>\(n\times\left(k+p\right)\)</foo> |
| matrix <foo>\(\boldsymbol{\Omega}\)</foo>. The seed defines matrix <foo>\(\mathbf{\Omega}\)</foo> |
| using Gaussian unit vectors per one of suggestions in [Halko, Martinsson, Tropp]. |
| |
| 2. <foo>\(\mathbf{Y=A\boldsymbol{\Omega}},\,\mathbf{Y}\in\mathbb{R}^{m\times\left(k+p\right)}\)</foo> |
| |
| 3. Column-orthonormalize <foo>\(\mathbf{Y}\rightarrow\mathbf{Q}\)</foo> |
| by computing thin decomposition <foo>\(\mathbf{Y}=\mathbf{Q}\mathbf{R}\)</foo>. |
| Also, <foo>\(\mathbf{Q}\in\mathbb{R}^{m\times\left(k+p\right)},\,\mathbf{R}\in\mathbb{R}^{\left(k+p\right)\times\left(k+p\right)}\)</foo>; denoted as <foo>\(\mathbf{Q}=\mbox{qr}\left(\mathbf{Y}\right).\mathbf{Q}\)</foo> |
| |
| 4. <foo>\(\mathbf{B}_{0}=\mathbf{Q}^{\top}\mathbf{A}:\,\,\mathbf{B}\in\mathbb{R}^{\left(k+p\right)\times n}\)</foo>. |
| |
| 5. If <foo>\(q>0\)</foo> |
| repeat: for <foo>\(i=1..q\)</foo>: |
| <foo>\(\mathbf{B}_{i}^{\top}=\mathbf{A}^{\top}\mbox{qr}\left(\mathbf{A}\mathbf{B}_{i-1}^{\top}\right).\mathbf{Q}\)</foo> |
| (power iterations step). |
| |
| 6. Compute Eigensolution of a small Hermitian <foo>\(\mathbf{B}_{q}\mathbf{B}_{q}^{\top}=\mathbf{\hat{U}}\boldsymbol{\Lambda}\mathbf{\hat{U}}^{\top}\)</foo>, |
| <foo>\(\mathbf{B}_{q}\mathbf{B}_{q}^{\top}\in\mathbb{R}^{\left(k+p\right)\times\left(k+p\right)}\)</foo>. |
| |
| 7. Singular values <foo>\(\mathbf{\boldsymbol{\Sigma}}=\boldsymbol{\Lambda}^{0.5}\)</foo>, |
| or, in other words, <foo>\(s_{i}=\sqrt{\sigma_{i}}\)</foo>. |
| |
| 8. If needed, compute <foo>\(\mathbf{U}=\mathbf{Q}\hat{\mathbf{U}}\)</foo>. |
| |
| 9. If needed, compute <foo>\(\mathbf{V}=\mathbf{B}_{q}^{\top}\hat{\mathbf{U}}\boldsymbol{\Sigma}^{-1}\)</foo>. |
| Another way is <foo>\(\mathbf{V}=\mathbf{A}^{\top}\mathbf{U}\boldsymbol{\Sigma}^{-1}\)</foo>. |
| |
| |
| |
| |
| ## Implementation |
| |
| Mahout `dssvd(...)` is implemented in the mahout `math-scala` algebraic optimizer which translates Mahout's R-like linear algebra operators into a physical plan for both Spark and H2O distributed engines. |
| |
| def dssvd[K: ClassTag](drmA: DrmLike[K], k: Int, p: Int = 15, q: Int = 0): |
| (DrmLike[K], DrmLike[Int], Vector) = { |
| |
| val drmAcp = drmA.checkpoint() |
| |
| val m = drmAcp.nrow |
| val n = drmAcp.ncol |
| assert(k <= (m min n), "k cannot be greater than smaller of m, n.") |
| val pfxed = safeToNonNegInt((m min n) - k min p) |
| |
| // Actual decomposition rank |
| val r = k + pfxed |
| |
| // We represent Omega by its seed. |
| val omegaSeed = RandomUtils.getRandom().nextInt() |
| |
| // Compute Y = A*Omega. |
| var drmY = drmAcp.mapBlock(ncol = r) { |
| case (keys, blockA) => |
| val blockY = blockA %*% Matrices.symmetricUniformView(n, r, omegaSeed) |
| keys -> blockY |
| } |
| |
| var drmQ = dqrThin(drmY.checkpoint())._1 |
| |
| // Checkpoint Q if last iteration |
| if (q == 0) drmQ = drmQ.checkpoint() |
| |
| var drmBt = drmAcp.t %*% drmQ |
| |
| // Checkpoint B' if last iteration |
| if (q == 0) drmBt = drmBt.checkpoint() |
| |
| for (i <- 0 until q) { |
| drmY = drmAcp %*% drmBt |
| drmQ = dqrThin(drmY.checkpoint())._1 |
| |
| // Checkpoint Q if last iteration |
| if (i == q - 1) drmQ = drmQ.checkpoint() |
| |
| drmBt = drmAcp.t %*% drmQ |
| |
| // Checkpoint B' if last iteration |
| if (i == q - 1) drmBt = drmBt.checkpoint() |
| } |
| |
| val (inCoreUHat, d) = eigen(drmBt.t %*% drmBt) |
| val s = d.sqrt |
| |
| // Since neither drmU nor drmV are actually computed until actually used |
| // we don't need the flags instructing compute (or not compute) either of the U,V outputs |
| val drmU = drmQ %*% inCoreUHat |
| val drmV = drmBt %*% (inCoreUHat %*%: diagv(1 /: s)) |
| |
| (drmU(::, 0 until k), drmV(::, 0 until k), s(0 until k)) |
| } |
| |
| Note: As a side effect of checkpointing, U and V values are returned as logical operators (i.e. they are neither checkpointed nor computed). Therefore there is no physical work actually done to compute <foo>\(\mathbf{U}\)</foo> or <foo>\(\mathbf{V}\)</foo> until they are used in a subsequent expression. |
| |
| |
| ## Usage |
| |
| The scala `dssvd(...)` method can easily be called in any Spark or H2O application built with the `math-scala` library and the corresponding `Spark` or `H2O` engine module as follows: |
| |
| import org.apache.mahout.math._ |
| import decompositions._ |
| import drm._ |
| |
| |
| val(drmU, drmV, s) = dssvd(drma, k = 40, q = 1) |
| |
| |
| ## References |
| |
| [1]: [Mahout Scala and Mahout Spark Bindings for Linear Algebra Subroutines](http://mahout.apache.org/users/sparkbindings/ScalaSparkBindings.pdf) |
| |
| [2]: [Randomized methods for computing low-rank |
| approximations of matrices](http://amath.colorado.edu/faculty/martinss/Pubs/2012_halko_dissertation.pdf) |
| |
| [2]: [Halko, Martinsson, Tropp](http://arxiv.org/abs/0909.4061) |
| |
| [3]: [Mahout Spark and Scala Bindings](http://mahout.apache.org/users/sparkbindings/home.html) |
| |
| |
| |