blob: 7b30ef2b2d090284b5137078df5a9cc4dfa2e0f8 [file] [log] [blame]
.. Licensed to the Apache Software Foundation (ASF) under one
.. or more contributor license agreements. See the NOTICE file
.. distributed with this work for additional information
.. regarding copyright ownership. The ASF licenses this file
.. to you under the Apache License, Version 2.0 (the
.. "License"); you may not use this file except in compliance
.. with the License. You may obtain a copy of the License at
.. http://www.apache.org/licenses/LICENSE-2.0
.. Unless required by applicable law or agreed to in writing,
.. software distributed under the License is distributed on an
.. "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
.. KIND, either express or implied. See the License for the
.. specific language governing permissions and limitations
.. under the License.
.. _guide:
Introduction
============
Welcome to the User Guide for the Python bindings of Arrow DataFusion. This guide aims to provide an introduction to
DataFusion through various examples and highlight the most effective ways of using it.
Installation
------------
DataFusion is a Python library and, as such, can be installed via pip from `PyPI <https://pypi.org/project/datafusion>`__.
.. code-block:: shell
pip install datafusion
You can verify the installation by running:
.. ipython:: python
import datafusion
datafusion.__version__
In this documentation we will also show some examples for how DataFusion integrates
with Jupyter notebooks. To install and start a Jupyter labs session use
.. code-block:: shell
pip install jupyterlab
jupyter lab
To demonstrate working with DataFusion, we need a data source. Later in the tutorial we will show
options for data sources. For our first example, we demonstrate using a Pokemon dataset that you
can download
`here <https://gist.githubusercontent.com/ritchie46/cac6b337ea52281aa23c049250a4ff03/raw/89a957ff3919d90e6ef2d34235e6bf22304f3366/pokemon.csv>`_.
With that file in place you can use the following python example to view the DataFrame in
DataFusion.
.. ipython:: python
from datafusion import SessionContext
ctx = SessionContext()
df = ctx.read_csv("pokemon.csv")
df.show()
If you are working in a Jupyter notebook, you can also use the following to give you a table
display that may be easier to read.
.. code-block:: shell
display(df)
.. image:: ../images/jupyter_lab_df_view.png
:width: 800
:alt: Rendered table showing Pokemon DataFrame