| --- |
| jupytext: |
| text_representation: |
| extension: .md |
| format_name: myst |
| kernelspec: |
| name: python3 |
| display_name: Python 3 |
| --- |
| <!--- |
| Licensed to the Apache Software Foundation (ASF) under one |
| or more contributor license agreements. See the NOTICE file |
| distributed with this work for additional information |
| regarding copyright ownership. The ASF licenses this file |
| to you under the Apache License, Version 2.0 (the |
| "License"); you may not use this file except in compliance |
| with the License. You may obtain a copy of the License at |
| |
| http://www.apache.org/licenses/LICENSE-2.0 |
| |
| Unless required by applicable law or agreed to in writing, |
| software distributed under the License is distributed on an |
| "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY |
| KIND, either express or implied. See the License for the |
| specific language governing permissions and limitations |
| under the License. |
| --> |
| |
| (guide)= |
| |
| # Introduction |
| |
| Welcome to the User Guide for the Python bindings of Arrow DataFusion. This guide aims to provide an introduction to |
| DataFusion through various examples and highlight the most effective ways of using it. |
| |
| ## Installation |
| |
| DataFusion is a Python library and, as such, can be installed via pip from [PyPI](https://pypi.org/project/datafusion). |
| |
| ```shell |
| pip install datafusion |
| ``` |
| |
| You can verify the installation by running: |
| |
| ```{code-cell} ipython3 |
| import datafusion |
| datafusion.__version__ |
| ``` |
| |
| In this documentation we will also show some examples for how DataFusion integrates |
| with Jupyter notebooks. To install and start a Jupyter labs session use |
| |
| ```shell |
| pip install jupyterlab |
| jupyter lab |
| ``` |
| |
| To demonstrate working with DataFusion, we need a data source. Later in the tutorial we will show |
| options for data sources. For our first example, we demonstrate using a Pokemon dataset that you |
| can download |
| [here](https://gist.githubusercontent.com/ritchie46/cac6b337ea52281aa23c049250a4ff03/raw/89a957ff3919d90e6ef2d34235e6bf22304f3366/pokemon.csv). |
| |
| With that file in place you can use the following python example to view the DataFrame in |
| DataFusion. |
| |
| ```{code-cell} ipython3 |
| from datafusion import SessionContext |
| |
| ctx = SessionContext() |
| |
| df = ctx.read_csv("pokemon.csv") |
| |
| df.show() |
| ``` |
| |
| If you are working in a Jupyter notebook, you can also use the following to give you a table |
| display that may be easier to read. |
| |
| ```shell |
| display(df) |
| ``` |
| |
| ```{image} ../images/jupyter_lab_df_view.png |
| :alt: Rendered table showing Pokemon DataFrame |
| :width: 800 |
| ``` |