| <!DOCTYPE html><html><head><title>R: Window functions for Column operations</title> |
| <meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> |
| <meta name="viewport" content="width=device-width, initial-scale=1.0, user-scalable=yes" /> |
| <link rel="stylesheet" href="https://cdn.jsdelivr.net/npm/katex@0.15.3/dist/katex.min.css"> |
| <script type="text/javascript"> |
| const macros = { "\\R": "\\textsf{R}", "\\code": "\\texttt"}; |
| function processMathHTML() { |
| var l = document.getElementsByClassName('reqn'); |
| for (let e of l) { katex.render(e.textContent, e, { throwOnError: false, macros }); } |
| return; |
| }</script> |
| <script defer src="https://cdn.jsdelivr.net/npm/katex@0.15.3/dist/katex.min.js" |
| onload="processMathHTML();"></script> |
| <link rel="stylesheet" type="text/css" href="R.css" /> |
| |
| <link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/styles/github.min.css"> |
| <script src="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/highlight.min.js"></script> |
| <script src="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/languages/r.min.js"></script> |
| <script>hljs.initHighlightingOnLoad();</script> |
| </head><body><div class="container"> |
| |
| <table style="width: 100%;"><tr><td>column_window_functions {SparkR}</td><td style="text-align: right;">R Documentation</td></tr></table> |
| |
| <h2>Window functions for Column operations</h2> |
| |
| <h3>Description</h3> |
| |
| <p>Window functions defined for <code>Column</code>. |
| </p> |
| |
| |
| <h3>Usage</h3> |
| |
| <pre><code class='language-R'>cume_dist(x = "missing") |
| |
| dense_rank(x = "missing") |
| |
| lag(x, ...) |
| |
| lead(x, offset, defaultValue = NULL) |
| |
| nth_value(x, offset, ...) |
| |
| ntile(x) |
| |
| percent_rank(x = "missing") |
| |
| rank(x, ...) |
| |
| row_number(x = "missing") |
| |
| ## S4 method for signature 'missing' |
| cume_dist() |
| |
| ## S4 method for signature 'missing' |
| dense_rank() |
| |
| ## S4 method for signature 'characterOrColumn' |
| lag(x, offset = 1, defaultValue = NULL) |
| |
| ## S4 method for signature 'characterOrColumn,numeric' |
| lead(x, offset = 1, defaultValue = NULL) |
| |
| ## S4 method for signature 'characterOrColumn,numeric' |
| nth_value(x, offset, na.rm = FALSE) |
| |
| ## S4 method for signature 'numeric' |
| ntile(x) |
| |
| ## S4 method for signature 'missing' |
| percent_rank() |
| |
| ## S4 method for signature 'missing' |
| rank() |
| |
| ## S4 method for signature 'ANY' |
| rank(x, ...) |
| |
| ## S4 method for signature 'missing' |
| row_number() |
| </code></pre> |
| |
| |
| <h3>Arguments</h3> |
| |
| <table> |
| <tr style="vertical-align: top;"><td><code>x</code></td> |
| <td> |
| <p>In <code>lag</code> and <code>lead</code>, it is the column as a character string or a Column |
| to compute on. In <code>ntile</code>, it is the number of ntile groups.</p> |
| </td></tr> |
| <tr style="vertical-align: top;"><td><code>...</code></td> |
| <td> |
| <p>additional argument(s).</p> |
| </td></tr> |
| <tr style="vertical-align: top;"><td><code>offset</code></td> |
| <td> |
| <p>a numeric indicating number of row to use as the value</p> |
| </td></tr> |
| <tr style="vertical-align: top;"><td><code>defaultValue</code></td> |
| <td> |
| <p>(optional) default to use when the offset row does not exist.</p> |
| </td></tr> |
| <tr style="vertical-align: top;"><td><code>na.rm</code></td> |
| <td> |
| <p>a logical which indicates that the Nth value should skip null in the |
| determination of which row to use</p> |
| </td></tr> |
| </table> |
| |
| |
| <h3>Details</h3> |
| |
| <p><code>cume_dist</code>: Returns the cumulative distribution of values within a window partition, |
| i.e. the fraction of rows that are below the current row: |
| (number of values before and including x) / (total number of rows in the partition). |
| This is equivalent to the <code>CUME_DIST</code> function in SQL. |
| The method should be used with no argument. |
| </p> |
| <p><code>dense_rank</code>: Returns the rank of rows within a window partition, without any gaps. |
| The difference between rank and dense_rank is that dense_rank leaves no gaps in ranking |
| sequence when there are ties. That is, if you were ranking a competition using dense_rank |
| and had three people tie for second place, you would say that all three were in second |
| place and that the next person came in third. Rank would give me sequential numbers, making |
| the person that came in third place (after the ties) would register as coming in fifth. |
| This is equivalent to the <code>DENSE_RANK</code> function in SQL. |
| The method should be used with no argument. |
| </p> |
| <p><code>lag</code>: Returns the value that is <code>offset</code> rows before the current row, and |
| <code>defaultValue</code> if there is less than <code>offset</code> rows before the current row. For example, |
| an <code>offset</code> of one will return the previous row at any given point in the window partition. |
| This is equivalent to the <code>LAG</code> function in SQL. |
| </p> |
| <p><code>lead</code>: Returns the value that is <code>offset</code> rows after the current row, and |
| <code>defaultValue</code> if there is less than <code>offset</code> rows after the current row. |
| For example, an <code>offset</code> of one will return the next row at any given point |
| in the window partition. |
| This is equivalent to the <code>LEAD</code> function in SQL. |
| </p> |
| <p><code>nth_value</code>: Window function: returns the value that is the <code>offset</code>th |
| row of the window frame# (counting from 1), and <code>null</code> if the size of window |
| frame is less than <code>offset</code> rows. |
| </p> |
| <p><code>ntile</code>: Returns the ntile group id (from 1 to n inclusive) in an ordered window |
| partition. For example, if n is 4, the first quarter of the rows will get value 1, the second |
| quarter will get 2, the third quarter will get 3, and the last quarter will get 4. |
| This is equivalent to the <code>NTILE</code> function in SQL. |
| </p> |
| <p><code>percent_rank</code>: Returns the relative rank (i.e. percentile) of rows within a window |
| partition. |
| This is computed by: (rank of row in its partition - 1) / (number of rows in the partition - 1). |
| This is equivalent to the <code>PERCENT_RANK</code> function in SQL. |
| The method should be used with no argument. |
| </p> |
| <p><code>rank</code>: Returns the rank of rows within a window partition. |
| The difference between rank and dense_rank is that dense_rank leaves no gaps in ranking |
| sequence when there are ties. That is, if you were ranking a competition using dense_rank |
| and had three people tie for second place, you would say that all three were in second |
| place and that the next person came in third. Rank would give me sequential numbers, making |
| the person that came in third place (after the ties) would register as coming in fifth. |
| This is equivalent to the <code>RANK</code> function in SQL. |
| The method should be used with no argument. |
| </p> |
| <p><code>row_number</code>: Returns a sequential number starting at 1 within a window partition. |
| This is equivalent to the <code>ROW_NUMBER</code> function in SQL. |
| The method should be used with no argument. |
| </p> |
| |
| |
| <h3>Note</h3> |
| |
| <p>cume_dist since 1.6.0 |
| </p> |
| <p>dense_rank since 1.6.0 |
| </p> |
| <p>lag since 1.6.0 |
| </p> |
| <p>lead since 1.6.0 |
| </p> |
| <p>nth_value since 3.1.0 |
| </p> |
| <p>ntile since 1.6.0 |
| </p> |
| <p>percent_rank since 1.6.0 |
| </p> |
| <p>rank since 1.6.0 |
| </p> |
| <p>row_number since 1.6.0 |
| </p> |
| |
| |
| <h3>Examples</h3> |
| |
| <pre><code class="r">## Not run: |
| ##D # Dataframe used throughout this doc |
| ##D df <- createDataFrame(cbind(model = rownames(mtcars), mtcars)) |
| ##D ws <- orderBy(windowPartitionBy("am"), "hp") |
| ##D tmp <- mutate(df, dist = over(cume_dist(), ws), dense_rank = over(dense_rank(), ws), |
| ##D lag = over(lag(df$mpg), ws), lead = over(lead(df$mpg, 1), ws), |
| ##D percent_rank = over(percent_rank(), ws), |
| ##D rank = over(rank(), ws), row_number = over(row_number(), ws), |
| ##D nth_value = over(nth_value(df$mpg, 3), ws)) |
| ##D # Get ntile group id (1-4) for hp |
| ##D tmp <- mutate(tmp, ntile = over(ntile(4), ws)) |
| ##D head(tmp) |
| ## End(Not run) |
| </code></pre> |
| |
| |
| <hr /><div style="text-align: center;">[Package <em>SparkR</em> version 3.2.2 <a href="00Index.html">Index</a>]</div> |
| </div> |
| </body></html> |