blob: 19b328d05c26f8e56a90add2f5b4c7768573e8ac [file] [log] [blame]
---
title: "Connecting to a Flight server"
description: >
Learn how to efficiently stream Apache Arrow data objects across a
network using Arrow Flight
output: rmarkdown::html_vignette
---
[Arrow Flight](https://arrow.apache.org/blog/2019/10/13/introducing-arrow-flight/) is a general-purpose client-server framework for high performance transport of large datasets over network interfaces, built as part of the Apache Arrow project. It allows for highly efficient data transfer by several means:
* Flight removes the need for deserialization during data transfer.
* Flight allows for parallel data streaming.
* Flight employs optimizations designed to take advantage of Arrow's columnar format.
The arrow package provides methods for connecting to Flight servers to send and receive data.
## Prerequisites
At present the arrow package in R does not supply an independent implementation of Arrow Flight: it works by calling [Flight methods supplied by PyArrow](https://arrow.apache.org/docs/python/api/flight.html) Python, and requires both the [reticulate](https://rstudio.github.io/reticulate/) package and the Python PyArrow library to be installed. If you are using them for the first time you can install them like this:
```r
install.packages("reticulate")
arrow::install_pyarrow()
```
See the [python integrations article](./python.html) for more details on setting up pyarrow.
## Example
The package includes methods for starting a Python-based Flight server, as well
as methods for connecting to a Flight server running elsewhere. To illustrate both sides, in one R process we'll start a demo server:
```r
library(arrow)
demo_server <- load_flight_server("demo_flight_server")
server <- demo_server$DemoFlightServer(port = 8089)
server$serve()
```
We'll leave that one running.
In a different R process, let's connect to it and put some data in it.
```r
library(arrow)
client <- flight_connect(port = 8089)
flight_put(client, iris, path = "test_data/iris")
```
Now, in yet another R process, we can connect to the server and pull the data we
put there:
```r
library(arrow)
library(dplyr)
client <- flight_connect(port = 8089)
client %>%
flight_get("test_data/iris") %>%
group_by(Species) %>%
summarize(max_petal = max(Petal.Length))
## # A tibble: 3 x 2
## Species max_petal
## <fct> <dbl>
## 1 setosa 1.9
## 2 versicolor 5.1
## 3 virginica 6.9
```
Because `flight_get()` returns an Arrow data structure, you can directly pipe
its result into a [dplyr](https://dplyr.tidyverse.org/) workflow.
See the article on [data wrangling](./data_wrangling.html) for more information on working with Arrow objects via a dplyr interface.
## Further reading
- The specification of the [Flight remote procedure call protocol](https://arrow.apache.org/docs/format/Flight.html) is listed on the Arrow project homepage
- The Arrow C++ documentation contains a list of [best practices](https://arrow.apache.org/docs/cpp/flight.html#best-practices) for Arrow Flight.
- A detailed worked example of an Arrow Flight server in Python is provided in the [Apache Arrow Python Cookbook](https://arrow.apache.org/cookbook/py/flight.html).