| <!DOCTYPE html><html><head><title>R: Alternating Least Squares (ALS) for Collaborative Filtering</title> |
| <meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> |
| <meta name="viewport" content="width=device-width, initial-scale=1.0, user-scalable=yes" /> |
| <link rel="stylesheet" href="https://cdn.jsdelivr.net/npm/katex@0.15.3/dist/katex.min.css"> |
| <script type="text/javascript"> |
| const macros = { "\\R": "\\textsf{R}", "\\code": "\\texttt"}; |
| function processMathHTML() { |
| var l = document.getElementsByClassName('reqn'); |
| for (let e of l) { katex.render(e.textContent, e, { throwOnError: false, macros }); } |
| return; |
| }</script> |
| <script defer src="https://cdn.jsdelivr.net/npm/katex@0.15.3/dist/katex.min.js" |
| onload="processMathHTML();"></script> |
| <link rel="stylesheet" type="text/css" href="R.css" /> |
| |
| <link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/styles/github.min.css"> |
| <script src="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/highlight.min.js"></script> |
| <script src="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/languages/r.min.js"></script> |
| <script>hljs.initHighlightingOnLoad();</script> |
| </head><body><div class="container"> |
| |
| <table style="width: 100%;"><tr><td>spark.als {SparkR}</td><td style="text-align: right;">R Documentation</td></tr></table> |
| |
| <h2>Alternating Least Squares (ALS) for Collaborative Filtering</h2> |
| |
| <h3>Description</h3> |
| |
| <p><code>spark.als</code> learns latent factors in collaborative filtering via alternating least |
| squares. Users can call <code>summary</code> to obtain fitted latent factors, <code>predict</code> |
| to make predictions on new data, and <code>write.ml</code>/<code>read.ml</code> to save/load fitted models. |
| </p> |
| |
| |
| <h3>Usage</h3> |
| |
| <pre><code class='language-R'>spark.als(data, ...) |
| |
| ## S4 method for signature 'SparkDataFrame' |
| spark.als( |
| data, |
| ratingCol = "rating", |
| userCol = "user", |
| itemCol = "item", |
| rank = 10, |
| regParam = 0.1, |
| maxIter = 10, |
| nonnegative = FALSE, |
| implicitPrefs = FALSE, |
| alpha = 1, |
| numUserBlocks = 10, |
| numItemBlocks = 10, |
| checkpointInterval = 10, |
| seed = 0 |
| ) |
| |
| ## S4 method for signature 'ALSModel' |
| summary(object) |
| |
| ## S4 method for signature 'ALSModel' |
| predict(object, newData) |
| |
| ## S4 method for signature 'ALSModel,character' |
| write.ml(object, path, overwrite = FALSE) |
| </code></pre> |
| |
| |
| <h3>Arguments</h3> |
| |
| <table> |
| <tr style="vertical-align: top;"><td><code>data</code></td> |
| <td> |
| <p>a SparkDataFrame for training.</p> |
| </td></tr> |
| <tr style="vertical-align: top;"><td><code>...</code></td> |
| <td> |
| <p>additional argument(s) passed to the method.</p> |
| </td></tr> |
| <tr style="vertical-align: top;"><td><code>ratingCol</code></td> |
| <td> |
| <p>column name for ratings.</p> |
| </td></tr> |
| <tr style="vertical-align: top;"><td><code>userCol</code></td> |
| <td> |
| <p>column name for user ids. Ids must be (or can be coerced into) integers.</p> |
| </td></tr> |
| <tr style="vertical-align: top;"><td><code>itemCol</code></td> |
| <td> |
| <p>column name for item ids. Ids must be (or can be coerced into) integers.</p> |
| </td></tr> |
| <tr style="vertical-align: top;"><td><code>rank</code></td> |
| <td> |
| <p>rank of the matrix factorization (> 0).</p> |
| </td></tr> |
| <tr style="vertical-align: top;"><td><code>regParam</code></td> |
| <td> |
| <p>regularization parameter (>= 0).</p> |
| </td></tr> |
| <tr style="vertical-align: top;"><td><code>maxIter</code></td> |
| <td> |
| <p>maximum number of iterations (>= 0).</p> |
| </td></tr> |
| <tr style="vertical-align: top;"><td><code>nonnegative</code></td> |
| <td> |
| <p>logical value indicating whether to apply nonnegativity constraints.</p> |
| </td></tr> |
| <tr style="vertical-align: top;"><td><code>implicitPrefs</code></td> |
| <td> |
| <p>logical value indicating whether to use implicit preference.</p> |
| </td></tr> |
| <tr style="vertical-align: top;"><td><code>alpha</code></td> |
| <td> |
| <p>alpha parameter in the implicit preference formulation (>= 0).</p> |
| </td></tr> |
| <tr style="vertical-align: top;"><td><code>numUserBlocks</code></td> |
| <td> |
| <p>number of user blocks used to parallelize computation (> 0).</p> |
| </td></tr> |
| <tr style="vertical-align: top;"><td><code>numItemBlocks</code></td> |
| <td> |
| <p>number of item blocks used to parallelize computation (> 0).</p> |
| </td></tr> |
| <tr style="vertical-align: top;"><td><code>checkpointInterval</code></td> |
| <td> |
| <p>number of checkpoint intervals (>= 1) or disable checkpoint (-1). |
| Note: this setting will be ignored if the checkpoint directory is not |
| set.</p> |
| </td></tr> |
| <tr style="vertical-align: top;"><td><code>seed</code></td> |
| <td> |
| <p>integer seed for random number generation.</p> |
| </td></tr> |
| <tr style="vertical-align: top;"><td><code>object</code></td> |
| <td> |
| <p>a fitted ALS model.</p> |
| </td></tr> |
| <tr style="vertical-align: top;"><td><code>newData</code></td> |
| <td> |
| <p>a SparkDataFrame for testing.</p> |
| </td></tr> |
| <tr style="vertical-align: top;"><td><code>path</code></td> |
| <td> |
| <p>the directory where the model is saved.</p> |
| </td></tr> |
| <tr style="vertical-align: top;"><td><code>overwrite</code></td> |
| <td> |
| <p>logical value indicating whether to overwrite if the output path |
| already exists. Default is FALSE which means throw exception |
| if the output path exists.</p> |
| </td></tr> |
| </table> |
| |
| |
| <h3>Details</h3> |
| |
| <p>For more details, see |
| <a href="https://spark.apache.org/docs/latest/ml-collaborative-filtering.html">MLlib: |
| Collaborative Filtering</a>. |
| </p> |
| |
| |
| <h3>Value</h3> |
| |
| <p><code>spark.als</code> returns a fitted ALS model. |
| </p> |
| <p><code>summary</code> returns summary information of the fitted model, which is a list. |
| The list includes <code>user</code> (the names of the user column), |
| <code>item</code> (the item column), <code>rating</code> (the rating column), <code>userFactors</code> |
| (the estimated user factors), <code>itemFactors</code> (the estimated item factors), |
| and <code>rank</code> (rank of the matrix factorization model). |
| </p> |
| <p><code>predict</code> returns a SparkDataFrame containing predicted values. |
| </p> |
| |
| |
| <h3>Note</h3> |
| |
| <p>spark.als since 2.1.0 |
| </p> |
| <p>the input rating dataframe to the ALS implementation should be deterministic. |
| Nondeterministic data can cause failure during fitting ALS model. For example, |
| an order-sensitive operation like sampling after a repartition makes dataframe output |
| nondeterministic, like <code>sample(repartition(df, 2L), FALSE, 0.5, 1618L)</code>. |
| Checkpointing sampled dataframe or adding a sort before sampling can help make the |
| dataframe deterministic. |
| </p> |
| <p>summary(ALSModel) since 2.1.0 |
| </p> |
| <p>predict(ALSModel) since 2.1.0 |
| </p> |
| <p>write.ml(ALSModel, character) since 2.1.0 |
| </p> |
| |
| |
| <h3>See Also</h3> |
| |
| <p><a href="../../SparkR/help/read.ml.html">read.ml</a> |
| </p> |
| |
| |
| <h3>Examples</h3> |
| |
| <pre><code class="r">## Not run: |
| ##D ratings <- list(list(0, 0, 4.0), list(0, 1, 2.0), list(1, 1, 3.0), list(1, 2, 4.0), |
| ##D list(2, 1, 1.0), list(2, 2, 5.0)) |
| ##D df <- createDataFrame(ratings, c("user", "item", "rating")) |
| ##D model <- spark.als(df, "rating", "user", "item") |
| ##D |
| ##D # extract latent factors |
| ##D stats <- summary(model) |
| ##D userFactors <- stats$userFactors |
| ##D itemFactors <- stats$itemFactors |
| ##D |
| ##D # make predictions |
| ##D predicted <- predict(model, df) |
| ##D showDF(predicted) |
| ##D |
| ##D # save and load the model |
| ##D path <- "path/to/model" |
| ##D write.ml(model, path) |
| ##D savedModel <- read.ml(path) |
| ##D summary(savedModel) |
| ##D |
| ##D # set other arguments |
| ##D modelS <- spark.als(df, "rating", "user", "item", rank = 20, |
| ##D regParam = 0.1, nonnegative = TRUE) |
| ##D statsS <- summary(modelS) |
| ## End(Not run) |
| </code></pre> |
| |
| |
| <hr /><div style="text-align: center;">[Package <em>SparkR</em> version 3.2.2 <a href="00Index.html">Index</a>]</div> |
| </div> |
| </body></html> |