| .. Licensed to the Apache Software Foundation (ASF) under one |
| .. or more contributor license agreements. See the NOTICE file |
| .. distributed with this work for additional information |
| .. regarding copyright ownership. The ASF licenses this file |
| .. to you under the Apache License, Version 2.0 (the |
| .. "License"); you may not use this file except in compliance |
| .. with the License. You may obtain a copy of the License at |
| |
| .. http://www.apache.org/licenses/LICENSE-2.0 |
| |
| .. Unless required by applicable law or agreed to in writing, |
| .. software distributed under the License is distributed on an |
| .. "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY |
| .. KIND, either express or implied. See the License for the |
| .. specific language governing permissions and limitations |
| .. under the License. |
| |
| |
| .. SCOPE OF THIS SECTION |
| .. This section should include extra description to the |
| .. language-specific documentation. Possible topics to |
| .. add: How to run one test, test file or all the tests |
| .. together and why is it good to do that. |
| .. What if the unconnected tests start failing? etc. |
| |
| |
| .. _testing: |
| |
| *********** |
| Testing ๐งช |
| *********** |
| |
| In this section we outline steps needed for unit testing in Arrow. |
| |
| .. tab-set:: |
| |
| .. tab-item:: PyArrow |
| |
| We use `pytest <https://docs.pytest.org/en/latest/>`_ for |
| unit tests in Python. For more info about the required |
| packages see |
| :ref:`Python unit testing section <python-unit-testing>`. |
| |
| **Structure** |
| |
| Test layout in PyArrow follows ``pytest`` structure for |
| `Tests as part of application code <https://docs.pytest.org/en/6.2.x/goodpractices.html#tests-as-part-of-application-code>`_: |
| |
| .. code:: |
| |
| pyarrow/ |
| __init__.py |
| csv.py |
| dataset.py |
| ... |
| tests/ |
| __init__.py |
| test_csv.py |
| test_dataset.py |
| ... |
| |
| Tests for Parquet are located in a separate folder ``pyarrow/tests/parquet/``. |
| |
| **Running tests** |
| |
| To run a specific unit test, use this command in |
| the terminal from the ``arrow/python`` folder: |
| |
| .. code:: console |
| |
| $ pytest pyarrow/tests/test_file.py -k test_your_unit_test |
| |
| Run all the tests from one file: |
| |
| .. code:: console |
| |
| $ pytest pyarrow/tests/test_file.py |
| |
| Run all the tests: |
| |
| .. code:: console |
| |
| $ pytest pyarrow |
| |
| You can also run the tests with ``python -m pytest [...]`` |
| which is almost equivalent to using ``pytest [...]`` directly, |
| except that calling via python will also add the current |
| directory to ``sys.path`` and can in some cases help if |
| ``pytest [...]`` results in an ImportError. |
| |
| **Recompiling PyArrow or Arrow C++** |
| |
| If the tests start failing, try to recompile PyArrow or |
| Arrow C++. See note in the :ref:`build_libraries_guide` |
| section under the PyArrow tab. |
| |
| **Fixtures** |
| |
| Inside PyArrow test files there can be helper functions |
| and fixtures defined. Also other pytest decorators such as |
| ``@parametrize`` or ``@skipif`` are used. |
| |
| For example: |
| |
| * ``_alltypes_example`` in ``test_pandas`` supplies a |
| dataframe with 100 rows for all data types. |
| * ``_check_pandas_roundtrip`` in ``test_pandas`` asserts if the |
| roundtrip from ``Pandas`` through ``pa.Table`` or |
| ``pa.RecordBatch`` back to ``Pandas`` yields the same result. |
| * ``large_buffer`` fixture supplying a PyArrow buffer of fixed |
| size to the function ``test_primitive_serialization(large_buffer)`` |
| in ``test_serialization.py``. |
| |
| For this reason it is good to look through the file you |
| are planning to add the tests to and see if any of |
| the defined functions or fixtures will be helpful. |
| |
| For more information about ``pytest`` in general visit |
| `Full pytest documentation <https://docs.pytest.org/en/stable/contents.html>`_ |
| |
| |
| .. tab-item:: R package |
| |
| We use `testthat <https://testthat.r-lib.org/index.html>`_ for |
| unit testing in R. More specifically, we use the `3rd edition |
| of testthat <https://testthat.r-lib.org/articles/third-edition.html>`_. |
| On rare occasions we might want the behaviour of the 2nd edition |
| of testthat, which is indicated by ``testthat::local_edition(2)``. |
| |
| **Structure** |
| |
| Expect the usual testthat folder structure: |
| |
| .. code-block:: R |
| |
| tests |
| โโโ testthat # test files live here |
| โโโ testthat.R # runs tests when R CMD check runs (e.g. with devtools::check()) |
| |
| This is the fundamental structure of testing in R with |
| ``testthat``. Files such as ``testthat.R`` are not |
| expected to change very often. For the ``arrow`` R |
| package ``testthat.R`` also defines how the results of |
| the various tests are displayed / reported in the console. |
| |
| Usually, most files in the ``R/`` sub-folder have a |
| corresponding test file in ``tests/testthat``. |
| |
| **Running tests** |
| |
| To run all tests in a package locally call |
| |
| .. code-block:: R |
| |
| devtools::test() |
| |
| in the R console. Alternatively, you can use |
| |
| .. code:: console |
| |
| $ make test |
| |
| in the shell. |
| |
| You can run the tests in a single test file you have open with |
| |
| .. code-block:: R |
| |
| devtools::test_active_file() |
| |
| All tests are also run as part of our continuous |
| integration (CI) pipelines. |
| |
| The `Arrow R Developer guide also has a section <https://arrow.apache.org/docs/r/articles/developing.html#running-tests>`_ |
| on running tests. |
| |
| **Good practice** |
| |
| In general any change to source code needs to be |
| accompanied by unit tests. All tests are expected |
| to pass before a pull request is merged. |
| |
| * Add functionality -> add unit tests |
| * Modify functionality -> update unit tests |
| * Solve a bug -> add unit test before solving it, |
| which helps prove the bug and its fix |
| * Performance improvements should be reflected in |
| benchmarks (which are also tests) |
| * An exception could be refactoring functionality that |
| is fully covered by unit tests |
| |
| A good rule of thumb is: If the new functionality is |
| a user-facing or API change, you will almost certainly |
| need to change tests โ if no tests need to be changed, |
| it might mean the tests aren't right! If the new |
| functionality is a refactor and no APIs are changing, |
| there might not need to be test changes. |
| |
| **Testing helpers** |
| |
| To complement the ``testthat`` functionality, the ``arrow`` |
| R package has defined a series of specific utility |
| functions (called helpers), such as: |
| |
| * expectations - these start with ``expect_`` and are used |
| to compare objects |
| |
| - for example, the ``expect_โฆ_roundtrip()`` functions |
| take an input, convert it to some other format |
| (e.g. arrow, altrep) and then convert it back, |
| confirming that the values are the same. |
| |
| .. code-block:: R |
| |
| x <- c(1, 2, 3, NA_real_) |
| expect_altrep_roundtrip(x, min, na.rm = TRUE) |
| |
| * ``skip_`` - skips a unit test - think of them as acceptable |
| fails. Situations in which we might want to skip unit tests: |
| |
| - ``skip_if_r_version()`` - this is a specific ``arrow`` skip. |
| For example, we use this to skip a unit test when the R |
| version is 3.5.0 and below (``skip_if_r_version(โ3.5.0โ)``). |
| You will likely see it used when the functionality we are |
| testing depends on features introduced after version 3.5.0 |
| of R (such as the alternative representation of vectors, |
| Altrep, introduced in R 3.5.0, but with significant additions |
| in subsequent releases). As part of our CI workflow we test |
| against different versions of R and this is where this |
| feature comes in. |
| - ``skip_if_not_available()`` - another specific {arrow} skip. |
| Arrow (libarrow) has a number of optional features that can be |
| switched on or off (which happens at build time). If a unit |
| test depends on such a feature and this feature is not |
| available (i.e. was not selected when libarrow was built) |
| the test is skipped, as opposed to having a failed test. |
| - ``skip_if_offline()`` - will not run tests that require an |
| internet connection |
| - ``skip_on_os()`` - for unit tests that are OS specific. |
| |
| *Important*: Once the conditions for a ``skip_()`` statement is met, |
| no other line of code in the same ``test_that()`` test block will |
| get executed. If the ``skip`` is outside of a ``test_that()`` code |
| block, it will skip the rest of the file. |
| |
| For more information about unit testing in R in general: |
| |
| * the ``testthat`` `website <https://testthat.r-lib.org/index.html>`_ |
| * the **R Packages** `book <https://r-pkgs.org>`_ by Hadley Wickham and Jenny Bryan |