blob: e94acb384b4c77217e249ec144c29ce20915e9c0 [file] [log] [blame]
.. Licensed to the Apache Software Foundation (ASF) under one
.. or more contributor license agreements. See the NOTICE file
.. distributed with this work for additional information
.. regarding copyright ownership. The ASF licenses this file
.. to you under the Apache License, Version 2.0 (the
.. "License"); you may not use this file except in compliance
.. with the License. You may obtain a copy of the License at
.. http://www.apache.org/licenses/LICENSE-2.0
.. Unless required by applicable law or agreed to in writing,
.. software distributed under the License is distributed on an
.. "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
.. KIND, either express or implied. See the License for the
.. specific language governing permissions and limitations
.. under the License.
.. SCOPE OF THIS SECTION
.. Concise tutorial on making a PR for a simple feature.
.. _r-tutorial:
***********
R tutorials
***********
Writing Bindings Walkthrough
============================
The first R package tutorial to be included in the New Contributor's
guide is a **Walkthrough** added in the **Writing Bindings**
vignette. With time we will try to include additional tutorials
directly into this guide.
This tutorial will show how to do a binding of a C++ function
`starts_with() <https://arrow.apache.org/docs/cpp/compute.html#containment-tests>`_
to the (base) R function ``startsWith()``.
To view the tutorial follow the
`Walkthrough section of the Writing Bindings article <https://arrow.apache.org/docs/r/articles/developers/bindings.html#walkthrough>`_.
R tutorial on adding a lubridate binding
========================================
In this tutorial, we will document the contribution of a binding
to Arrow R package following the steps specified by the
:ref:`quick-ref-guide` section of the guide and a more detailed
:ref:`step_by_step` section. Navigate there whenever there is
some information you may find is missing here.
The binding will be added to the ``expression.R`` file in the
R package. But you can also follow these steps in case you are
adding a binding that will live somewhere else.
.. seealso::
To read more about the philosophy behind R bindings, refer to the
`Writing Bindings article <https://arrow.apache.org/docs/r/articles/developers/bindings.html>`_.
This tutorial is different from the :ref:`step_by_step` as we
will be working on a specific case. This tutorial is not meant
as a step-by-step guide.
**Let's start!**
Set up
------
Let's set up the Arrow repository. We presume here that Git is
already installed. Otherwise please see the :ref:`set-up` section.
Once the `Apache Arrow repository <https://github.com/apache/arrow>`_
is forked (see :ref:`fork-repo-guide`) we will clone it and add the
link of the main repository to our upstream.
.. code:: console
$ git clone https://github.com/<your username>/arrow.git
$ cd arrow
$ git remote add upstream https://github.com/apache/arrow
Building R package
------------------
The steps to follow for for building the R package differs depending on the operating
system you are using. For this reason we will only refer to
the instructions for the building process in this tutorial.
.. seealso::
For the **introduction** to the building process refer to the
:ref:`build-arrow-guide` section.
For the **instructions** on how to build the R package refer to the
`R developer docs <https://arrow.apache.org/docs/r/articles/developing.html>`_.
The issue
---------
In this tutorial we will be tackling an issue for implementing
a simple binding for ``mday()`` function that will match that of the
existing R function from ``lubridate``.
.. note::
If you do not have an issue and you need help finding one please refer
to the :ref:`finding-issues` part of the guide.
Once you have an issue picked out and assigned to yourself, you can
proceed to the next step.
Start the work on a new branch
------------------------------
Before we start working on adding the binding we should
create a new branch from the updated main.
.. code:: console
$ git checkout main
$ git fetch upstream
$ git pull --ff-only upstream main
$ git checkout -b ARROW-14816
Now we can start with researching the R function and the C++ Arrow
compute function we want to expose or connect to.
**Examine the lubridate mday() function**
Going through the `lubridate documentation <https://lubridate.tidyverse.org/reference/day.html>`_
we can see that ``mday()`` takes a date object
and returns the day of the month as a numeric object.
We can run some examples in the R console to help us understand
the function better:
.. code-block:: R
> library(lubridate)
> mday(as.Date("2000-12-31"))
[1] 31
> mday(ymd(080306))
[1] 6
**Examine the Arrow C++ day() function**
From the `compute function documentation <https://arrow.apache.org/docs/cpp/compute.html#containment-tests>`_
we can see that ``day`` is a unary function, which means that it takes
a single data input. The data input must be a ``Temporal class`` and
the returned value is an ``Integer/numeric`` type.
The ``Temporal class`` is specified as: Date types (Date32, Date64),
Time types (Time32, Time64), Timestamp, Duration, Interval.
We can call an Arrow C++ function from an R console using ``call_function``
to see how it works:
.. code-block:: R
> call_function("day", Scalar$create(lubridate::ymd("2000-12-31")))
Scalar
31
We can see that lubridate and Arrow functions operate on and return
equivalent data types. lubridate's ``mday()`` function has no additional
arguments and there are also no option classes associated with Arrow C++
function ``day()``.
.. note::
To see what to do if there is an option class associated with the
function you are binding, refer to
`Examining the C++ function <https://arrow.apache.org/docs/r/articles/developers/bindings.html#examining-the-c-function>`_ from the Writing Bindings
article.
Looking at the code in ``expressions.R`` we can see the day function
is already specified/mapped on the R package side:
`<https://github.com/apache/arrow/blob/658bec37aa5cbdd53b5e4cdc81b8ba3962e67f11/r/R/expression.R#L63-L64>`_
We only need to add ``mday()`` to the list of expressions connecting
it to the C++ ``day`` function.
.. code-block:: R
# second is defined in dplyr-functions.R
# wday is defined in dplyr-functions.R
"mday" = "day",
"yday" = "day_of_year",
"year" = "year",
Adding a test
-------------
Now we need to add a test that checks if everything works well.
If there are additional options or edge cases, we would have to
add more. Looking at tests for similar functions (for example
``yday()`` or ``day())`` we can see that a good place to add two
tests we have is in ``test-dplyr-funcs-datetime.R``:
.. code-block:: R
test_that("extract mday from timestamp", {
compare_dplyr_binding(
.input %>%
mutate(x = mday(datetime)) %>%
collect(),
test_df
)
})
And
.. code-block:: R
test_that("extract mday from date", {
compare_dplyr_binding(
.input %>%
mutate(x = mday(date)) %>%
collect(),
test_df
)
})
Now we need to see if the tests are passing or we need to do some
more research and code corrections.
.. code-block:: R
devtools::test(filter="datetime")
> devtools::test(filter="datetime")
ℹ Loading arrow
See arrow_info() for available features
ℹ Testing arrow
See arrow_info() for available features
✔ | F W S OK | Context
✖ | 1 230 | dplyr-funcs-datetime [1.4s]
────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Failure (test-dplyr-funcs-datetime.R:187:3): strftime
``%>%`(...)` did not throw the expected error.
Backtrace:
1. testthat::expect_error(...) test-dplyr-funcs-datetime.R:187:2
2. testthat:::expect_condition_matching(...)
────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
══ Results ═════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════
Duration: 1.4 s
[ FAIL 1 | WARN 0 | SKIP 0 | PASS 230 ]
There is a failure we get for the ``strftime`` function but looking
at the code we see is not connected to our work. We can move on and
maybe ask others if they are getting similar fail when running the tests.
It could be we only need to rebuild the library.
Check styling
-------------
We should also run linters to check that the styling of the code
follows the `tidyverse style <https://style.tidyverse.org/>`_. To
do that we run the following command in the shell:
.. code:: console
$ make style
R -s -e 'setwd(".."); if (requireNamespace("styler")) styler::style_file(setdiff(system("git diff --name-only | grep r/.*R$", intern = TRUE), file.path("r", source("r/.styler_excludes.R")$value)))'
Loading required namespace: styler
Styling 2 files:
r/R/expression.R ✔
r/tests/testthat/test-dplyr-funcs-datetime.R ℹ
────────────────────────────────────────────
Status Count Legend
✔ 1 File unchanged.
ℹ 1 File changed.
✖ 0 Styling threw an error.
────────────────────────────────────────────
Please review the changes carefully!
Creating a Pull Request
-----------------------
First let’s review our changes in the shell using ``git status`` to see
which files have been changed and to commit only the ones we are working on.
.. code:: console
$ git status
On branch ARROW-14816
Changes not staged for commit:
(use "git add <file>..." to update what will be committed)
(use "git restore <file>..." to discard changes in working directory)
modified: R/expression.R
modified: tests/testthat/test-dplyr-funcs-datetime.R
And ``git diff`` to see the changes in the files in order to spot any error we might have made.
.. code:: console
$ git diff
diff --git a/r/R/expression.R b/r/R/expression.R
index 37fc21c25..0e71803ec 100644
--- a/r/R/expression.R
+++ b/r/R/expression.R
@@ -70,6 +70,7 @@
"quarter" = "quarter",
# second is defined in dplyr-functions.R
# wday is defined in dplyr-functions.R
+ "mday" = "day",
"yday" = "day_of_year",
"year" = "year",
diff --git a/r/tests/testthat/test-dplyr-funcs-datetime.R b/r/tests/testthat/test-dplyr-funcs-datetime.R
index 359a5403a..228eca56a 100644
--- a/r/tests/testthat/test-dplyr-funcs-datetime.R
+++ b/r/tests/testthat/test-dplyr-funcs-datetime.R
@@ -444,6 +444,15 @@ test_that("extract wday from timestamp", {
)
})
+test_that("extract mday from timestamp", {
+ compare_dplyr_binding(
+ .input %>%
+ mutate(x = mday(datetime)) %>%
+ collect(),
+ test_df
+ )
+})
+
test_that("extract yday from timestamp", {
compare_dplyr_binding(
.input %>%
@@ -626,6 +635,15 @@ test_that("extract wday from date", {
)
})
+test_that("extract mday from date", {
+ compare_dplyr_binding(
+ .input %>%
+ mutate(x = mday(date)) %>%
+ collect(),
+ test_df
+ )
+})
+
test_that("extract yday from date", {
compare_dplyr_binding(
.input %>%
Everything looks OK. Now we can make the commit
(save our changes to the branch history):
.. code:: console
$ git commit -am "Adding a binding and a test for mday() lubridate"
[ARROW-14816 ed37d3a3b] Adding a binding and a test for mday() lubridate
2 files changed, 19 insertions(+)
We can use ``git log`` to check the history of commits:
.. code:: console
$ git log
commit ed37d3a3b3eef76b696532f10562fea85f809fab (HEAD -> ARROW-14816)
Author: Alenka Frim <frim.alenka@gmail.com>
Date: Fri Jan 21 09:15:31 2022 +0100
Adding a binding and a test for mday() lubridate
commit c5358787ee8f7b80f067292f49e5f032854041b9 (upstream/main, upstream/HEAD, main, ARROW-15346, ARROW-10643)
Author: Krisztián Szűcs <szucs.krisztian@gmail.com>
Date: Thu Jan 20 09:45:59 2022 +0900
ARROW-15372: [C++][Gandiva] Gandiva now depends on boost/crc.hpp which is missing from the trimmed boost archive
See build error https://github.com/ursacomputing/crossbow/runs/4871392838?check_suite_focus=true#step:5:11762
Closes #12190 from kszucs/ARROW-15372
Authored-by: Krisztián Szűcs <szucs.krisztian@gmail.com>
Signed-off-by: Sutou Kouhei <kou@clear-code.com>
If we started the branch some time ago, we may need to rebase
to upstream main to make sure there are no merge conflicts:
.. code:: console
$ git pull upstream main --rebase
And now we can push our work to the forked Arrow repository
on GitHub called origin.
.. code:: console
$ git push origin ARROW-14816
Enumerating objects: 233, done.
Counting objects: 100% (233/233), done.
Delta compression using up to 8 threads
Compressing objects: 100% (130/130), done.
Writing objects: 100% (151/151), 35.78 KiB | 8.95 MiB/s, done.
Total 151 (delta 129), reused 33 (delta 20), pack-reused 0
remote: Resolving deltas: 100% (129/129), completed with 80 local objects.
remote:
remote: Create a pull request for 'ARROW-14816' on GitHub by visiting:
remote: https://github.com/AlenkaF/arrow/pull/new/ARROW-14816
remote:
To https://github.com/AlenkaF/arrow.git
* [new branch] ARROW-14816 -> ARROW-14816
Now we have to go to the `Arrow repository on GitHub <https://github.com/apache/arrow>`_
to create a Pull Request. On the GitHub Arrow
page (main or forked) we will see a yellow notice
bar with a note that we made recent pushes to the branch
ARROW-14816. That’s great, now we can make the Pull Request
by clicking on **Compare & pull request**.
.. figure:: /developers/images/R_tutorial_create_pr_notice.jpeg
:scale: 60 %
:alt: GitHub page of the Apache Arrow repository showing a notice bar
indicating change has been made in our branch and a Pull Request
can be created.
Notice bar on the Apache Arrow repository.
First we need to change the Title to **ARROW-14816: [R] Implement
bindings for lubridate::mday()** in order to match it with the
issue. Note a punctuation mark was added!
*Extra note: when this tutorial was created, we had been using the Jira issue
tracker. As we are currently using GitHub issues, the title would be prefixed
with GH-14816: [R] Implement bindings for lubridate::mday()*.
We will also add a description to make it clear to others what we are trying to do.
.. figure:: /developers/images/R_tutorial_pr_descr.jpeg
:scale: 50 %
:alt: GitHub page of the Pull Request showing the editor for the
title and a description.
Editing the title and the description of our Pull Request.
Once we click **Create pull request** our code can be reviewed as
a Pull Request in the Apache Arrow repository.
.. figure:: /developers/images/R_tutorial_pr.jpeg
:scale: 50 %
:alt: GitHub page of the Pull Request showing the title and a
description.
Here it is, our Pull Request!
The pull request gets connected to the issue and the CI is running.
After some time passes and we get a review we can correct the code,
comment, resolve conversations and so on.
.. seealso::
For more information about Pull Request workflow see :ref:`pr_lifecycle`.
The Pull Request we made can be viewed `here <https://github.com/apache/arrow/pull/12218>`_.