blob: 1c4b8db4cecec28d5a2872a9c9f68a52af90de57 [file] [view]
---
jupytext:
text_representation:
extension: .md
format_name: myst
kernelspec:
name: python3
display_name: Python 3
---
<!---
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->
(guide)=
# Introduction
Welcome to the User Guide for the Python bindings of Arrow DataFusion. This guide aims to provide an introduction to
DataFusion through various examples and highlight the most effective ways of using it.
## Installation
DataFusion is a Python library and, as such, can be installed via pip from [PyPI](https://pypi.org/project/datafusion).
```shell
pip install datafusion
```
You can verify the installation by running:
```{code-cell} ipython3
import datafusion
datafusion.__version__
```
In this documentation we will also show some examples for how DataFusion integrates
with Jupyter notebooks. To install and start a Jupyter labs session use
```shell
pip install jupyterlab
jupyter lab
```
To demonstrate working with DataFusion, we need a data source. Later in the tutorial we will show
options for data sources. For our first example, we demonstrate using a Pokemon dataset that you
can download
[here](https://gist.githubusercontent.com/ritchie46/cac6b337ea52281aa23c049250a4ff03/raw/89a957ff3919d90e6ef2d34235e6bf22304f3366/pokemon.csv).
With that file in place you can use the following python example to view the DataFrame in
DataFusion.
```{code-cell} ipython3
from datafusion import SessionContext
ctx = SessionContext()
df = ctx.read_csv("pokemon.csv")
df.show()
```
If you are working in a Jupyter notebook, you can also use the following to give you a table
display that may be easier to read.
```shell
display(df)
```
```{image} ../images/jupyter_lab_df_view.png
:alt: Rendered table showing Pokemon DataFrame
:width: 800
```