blob: cd5ef3c4b1b910dc552ae01566c54aa506b93594 [file] [log] [blame]
<!DOCTYPE html>
<html class="writer-html5" lang="en" >
<head>
<meta charset="utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<title>Feather File Format &mdash; Apache Arrow v3.0.0</title>
<link rel="stylesheet" href="../_static/css/theme.css" type="text/css" />
<link rel="stylesheet" href="../_static/pygments.css" type="text/css" />
<link rel="stylesheet" href="../_static/theme_overrides.css" type="text/css" />
<!--[if lt IE 9]>
<script src="../_static/js/html5shiv.min.js"></script>
<![endif]-->
<script type="text/javascript" id="documentation_options" data-url_root="../" src="../_static/documentation_options.js"></script>
<script src="../_static/jquery.js"></script>
<script src="../_static/underscore.js"></script>
<script src="../_static/doctools.js"></script>
<script src="../_static/language_data.js"></script>
<script type="text/javascript" src="../_static/js/theme.js"></script>
<link rel="canonical" href="https://arrow.apache.org/docs/python/feather.html" />
<link rel="index" title="Index" href="../genindex.html" />
<link rel="search" title="Search" href="../search.html" />
<link rel="next" title="Reading JSON files" href="json.html" />
<link rel="prev" title="Reading CSV files" href="csv.html" />
<!-- Matomo -->
<script>
var _paq = window._paq = window._paq || [];
/* tracker methods like "setCustomDimension" should be called before "trackPageView" */
_paq.push(["setDoNotTrack", true]);
_paq.push(["disableCookies"]);
_paq.push(['trackPageView']);
_paq.push(['enableLinkTracking']);
(function() {
var u="https://analytics.apache.org/";
_paq.push(['setTrackerUrl', u+'matomo.php']);
_paq.push(['setSiteId', '20']);
var d=document, g=d.createElement('script'), s=d.getElementsByTagName('script')[0];
g.async=true; g.src=u+'matomo.js'; s.parentNode.insertBefore(g,s);
})();
</script>
<!-- End Matomo Code -->
</head>
<body class="wy-body-for-nav">
<div class="wy-grid-for-nav">
<nav data-toggle="wy-nav-shift" class="wy-nav-side">
<div class="wy-side-scroll">
<div class="wy-side-nav-search" >
<a href="../index.html" class="icon icon-home"> Apache Arrow
</a>
<div class="version">
3.0.0
</div>
<div role="search">
<form id="rtd-search-form" class="wy-form" action="../search.html" method="get">
<input type="text" name="q" placeholder="Search docs" />
<input type="hidden" name="check_keywords" value="yes" />
<input type="hidden" name="area" value="default" />
</form>
</div>
</div>
<div class="wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="main navigation">
<p class="caption"><span class="caption-text">Specifications and Protocols</span></p>
<ul>
<li class="toctree-l1"><a class="reference internal" href="../format/Versioning.html">Format Versioning and Stability</a></li>
<li class="toctree-l1"><a class="reference internal" href="../format/Columnar.html">Arrow Columnar Format</a></li>
<li class="toctree-l1"><a class="reference internal" href="../format/Flight.html">Arrow Flight RPC</a></li>
<li class="toctree-l1"><a class="reference internal" href="../format/Integration.html">Integration Testing</a></li>
<li class="toctree-l1"><a class="reference internal" href="../format/CDataInterface.html">The Arrow C data interface</a></li>
<li class="toctree-l1"><a class="reference internal" href="../format/CStreamInterface.html">The Arrow C stream interface</a></li>
<li class="toctree-l1"><a class="reference internal" href="../format/Other.html">Other Data Structures</a></li>
</ul>
<p class="caption"><span class="caption-text">Libraries</span></p>
<ul class="current">
<li class="toctree-l1"><a class="reference internal" href="../status.html">Implementation Status</a></li>
<li class="toctree-l1"><a class="reference external" href="https://arrow.apache.org/docs/c_glib/">C/GLib</a></li>
<li class="toctree-l1"><a class="reference internal" href="../cpp/index.html">C++</a></li>
<li class="toctree-l1"><a class="reference external" href="https://github.com/apache/arrow/blob/master/csharp/README.md">C#</a></li>
<li class="toctree-l1"><a class="reference external" href="https://godoc.org/github.com/apache/arrow/go/arrow">Go</a></li>
<li class="toctree-l1"><a class="reference internal" href="../java/index.html">Java</a></li>
<li class="toctree-l1"><a class="reference external" href="https://arrow.apache.org/docs/js/">JavaScript</a></li>
<li class="toctree-l1"><a class="reference external" href="https://github.com/apache/arrow/blob/master/julia/Arrow/README.md">Julia</a></li>
<li class="toctree-l1"><a class="reference external" href="https://github.com/apache/arrow/blob/master/matlab/README.md">MATLAB</a></li>
<li class="toctree-l1 current"><a class="reference internal" href="index.html">Python</a><ul class="current">
<li class="toctree-l2"><a class="reference internal" href="install.html">Installing PyArrow</a></li>
<li class="toctree-l2"><a class="reference internal" href="memory.html">Memory and IO Interfaces</a></li>
<li class="toctree-l2"><a class="reference internal" href="data.html">Data Types and In-Memory Data Model</a></li>
<li class="toctree-l2"><a class="reference internal" href="compute.html">Compute Functions</a></li>
<li class="toctree-l2"><a class="reference internal" href="ipc.html">Streaming, Serialization, and IPC</a></li>
<li class="toctree-l2"><a class="reference internal" href="filesystems.html">Filesystem Interface</a></li>
<li class="toctree-l2"><a class="reference internal" href="filesystems_deprecated.html">Filesystem Interface (legacy)</a></li>
<li class="toctree-l2"><a class="reference internal" href="plasma.html">The Plasma In-Memory Object Store</a></li>
<li class="toctree-l2"><a class="reference internal" href="numpy.html">NumPy Integration</a></li>
<li class="toctree-l2"><a class="reference internal" href="pandas.html">Pandas Integration</a></li>
<li class="toctree-l2"><a class="reference internal" href="timestamps.html">Timestamps</a></li>
<li class="toctree-l2"><a class="reference internal" href="csv.html">Reading CSV files</a></li>
<li class="toctree-l2 current"><a class="current reference internal" href="#">Feather File Format</a><ul>
<li class="toctree-l3"><a class="reference internal" href="#using-compression">Using Compression</a></li>
<li class="toctree-l3"><a class="reference internal" href="#writing-version-1-v1-files">Writing Version 1 (V1) Files</a></li>
</ul>
</li>
<li class="toctree-l2"><a class="reference internal" href="json.html">Reading JSON files</a></li>
<li class="toctree-l2"><a class="reference internal" href="parquet.html">Reading and Writing the Apache Parquet Format</a></li>
<li class="toctree-l2"><a class="reference internal" href="dataset.html">Tabular Datasets</a></li>
<li class="toctree-l2"><a class="reference internal" href="cuda.html">CUDA Integration</a></li>
<li class="toctree-l2"><a class="reference internal" href="extending_types.html">Extending pyarrow</a></li>
<li class="toctree-l2"><a class="reference internal" href="extending.html">Using pyarrow from C++ and Cython Code</a></li>
<li class="toctree-l2"><a class="reference internal" href="api.html">API Reference</a></li>
<li class="toctree-l2"><a class="reference internal" href="getting_involved.html">Getting Involved</a></li>
<li class="toctree-l2"><a class="reference internal" href="benchmarks.html">Benchmarks</a></li>
</ul>
</li>
<li class="toctree-l1"><a class="reference external" href="https://arrow.apache.org/docs/r/">R</a></li>
<li class="toctree-l1"><a class="reference external" href="https://github.com/apache/arrow/blob/master/ruby/README.md">Ruby</a></li>
<li class="toctree-l1"><a class="reference external" href="https://docs.rs/crate/arrow/">Rust</a></li>
</ul>
<p class="caption"><span class="caption-text">Development</span></p>
<ul>
<li class="toctree-l1"><a class="reference internal" href="../developers/contributing.html">Contributing to Apache Arrow</a></li>
<li class="toctree-l1"><a class="reference internal" href="../developers/cpp/index.html">C++ Development</a></li>
<li class="toctree-l1"><a class="reference internal" href="../developers/python.html">Python Development</a></li>
<li class="toctree-l1"><a class="reference internal" href="../developers/archery.html">Daily Development using Archery</a></li>
<li class="toctree-l1"><a class="reference internal" href="../developers/crossbow.html">Packaging and Testing with Crossbow</a></li>
<li class="toctree-l1"><a class="reference internal" href="../developers/docker.html">Running Docker Builds</a></li>
<li class="toctree-l1"><a class="reference internal" href="../developers/benchmarks.html">Benchmarks</a></li>
<li class="toctree-l1"><a class="reference internal" href="../developers/documentation.html">Building the Documentation</a></li>
</ul>
</div>
</div>
</nav>
<section data-toggle="wy-nav-shift" class="wy-nav-content-wrap">
<nav class="wy-nav-top" aria-label="top navigation">
<i data-toggle="wy-nav-top" class="fa fa-bars"></i>
<a href="../index.html">Apache Arrow</a>
</nav>
<div class="wy-nav-content">
<div class="rst-content">
<div role="navigation" aria-label="breadcrumbs navigation">
<ul class="wy-breadcrumbs">
<li><a href="../index.html" class="icon icon-home"></a> &raquo;</li>
<li><a href="index.html">Python bindings</a> &raquo;</li>
<li>Feather File Format</li>
<li class="wy-breadcrumbs-aside">
<a href="../_sources/python/feather.rst.txt" rel="nofollow"> View page source</a>
</li>
</ul>
<hr/>
</div>
<div role="main" class="document" itemscope="itemscope" itemtype="http://schema.org/Article">
<div itemprop="articleBody">
<div class="section" id="feather-file-format">
<span id="feather"></span><h1>Feather File Format<a class="headerlink" href="#feather-file-format" title="Permalink to this headline"></a></h1>
<p>Feather is a portable file format for storing Arrow tables or data frames (from
languages like Python or R) that utilizes the <a class="reference internal" href="ipc.html#ipc"><span class="std std-ref">Arrow IPC format</span></a>
internally. Feather was created early in the Arrow project as a proof of
concept for fast, language-agnostic data frame storage for Python (pandas) and
R. There are two file format versions for Feather:</p>
<ul class="simple">
<li><p>Version 2 (V2), the default version, which is exactly represented as the
Arrow IPC file format on disk. V2 files support storing all Arrow data types
as well as compression with LZ4 or ZSTD. V2 was first made available in
Apache Arrow 0.17.0.</p></li>
<li><p>Version 1 (V1), a legacy version available starting in 2016, replaced by
V2. V1 files are distinct from Arrow IPC files and lack many features, such
as the ability to store all Arrow data types. V1 files also lack compression
support. We intend to maintain read support for V1 for the foreseeable
future.</p></li>
</ul>
<p>The <code class="docutils literal notranslate"><span class="pre">pyarrow.feather</span></code> module contains the read and write functions for the
format. <a class="reference internal" href="generated/pyarrow.feather.write_feather.html#pyarrow.feather.write_feather" title="pyarrow.feather.write_feather"><code class="xref py py-func docutils literal notranslate"><span class="pre">write_feather()</span></code></a> accepts either a
<a class="reference internal" href="generated/pyarrow.Table.html#pyarrow.Table" title="pyarrow.Table"><code class="xref py py-class docutils literal notranslate"><span class="pre">Table</span></code></a> or <code class="docutils literal notranslate"><span class="pre">pandas.DataFrame</span></code> object:</p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">pyarrow.feather</span> <span class="kn">as</span> <span class="nn">feather</span>
<span class="n">feather</span><span class="o">.</span><span class="n">write_feather</span><span class="p">(</span><span class="n">df</span><span class="p">,</span> <span class="s1">&#39;/path/to/file&#39;</span><span class="p">)</span>
</pre></div>
</div>
<p><a class="reference internal" href="generated/pyarrow.feather.read_feather.html#pyarrow.feather.read_feather" title="pyarrow.feather.read_feather"><code class="xref py py-func docutils literal notranslate"><span class="pre">read_feather()</span></code></a> reads a Feather file as a
<code class="docutils literal notranslate"><span class="pre">pandas.DataFrame</span></code>. <a class="reference internal" href="generated/pyarrow.feather.read_table.html#pyarrow.feather.read_table" title="pyarrow.feather.read_table"><code class="xref py py-func docutils literal notranslate"><span class="pre">read_table()</span></code></a> reads a Feather file
as a <a class="reference internal" href="generated/pyarrow.Table.html#pyarrow.Table" title="pyarrow.Table"><code class="xref py py-class docutils literal notranslate"><span class="pre">Table</span></code></a>. Internally, <a class="reference internal" href="generated/pyarrow.feather.read_feather.html#pyarrow.feather.read_feather" title="pyarrow.feather.read_feather"><code class="xref py py-func docutils literal notranslate"><span class="pre">read_feather()</span></code></a>
simply calls <a class="reference internal" href="generated/pyarrow.feather.read_table.html#pyarrow.feather.read_table" title="pyarrow.feather.read_table"><code class="xref py py-func docutils literal notranslate"><span class="pre">read_table()</span></code></a> and the result is converted to
pandas:</p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="c1"># Result is pandas.DataFrame</span>
<span class="n">read_df</span> <span class="o">=</span> <span class="n">feather</span><span class="o">.</span><span class="n">read_feather</span><span class="p">(</span><span class="s1">&#39;/path/to/file&#39;</span><span class="p">)</span>
<span class="c1"># Result is pyarrow.Table</span>
<span class="n">read_arrow</span> <span class="o">=</span> <span class="n">feather</span><span class="o">.</span><span class="n">read_table</span><span class="p">(</span><span class="s1">&#39;/path/to/file&#39;</span><span class="p">)</span>
</pre></div>
</div>
<p>These functions can read and write with file-paths or file-like objects. For
example:</p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="k">with</span> <span class="nb">open</span><span class="p">(</span><span class="s1">&#39;/path/to/file&#39;</span><span class="p">,</span> <span class="s1">&#39;wb&#39;</span><span class="p">)</span> <span class="k">as</span> <span class="n">f</span><span class="p">:</span>
<span class="n">feather</span><span class="o">.</span><span class="n">write_feather</span><span class="p">(</span><span class="n">df</span><span class="p">,</span> <span class="n">f</span><span class="p">)</span>
<span class="k">with</span> <span class="nb">open</span><span class="p">(</span><span class="s1">&#39;/path/to/file&#39;</span><span class="p">,</span> <span class="s1">&#39;rb&#39;</span><span class="p">)</span> <span class="k">as</span> <span class="n">f</span><span class="p">:</span>
<span class="n">read_df</span> <span class="o">=</span> <span class="n">feather</span><span class="o">.</span><span class="n">read_feather</span><span class="p">(</span><span class="n">f</span><span class="p">)</span>
</pre></div>
</div>
<p>A file input to <code class="docutils literal notranslate"><span class="pre">read_feather</span></code> must support seeking.</p>
<div class="section" id="using-compression">
<h2>Using Compression<a class="headerlink" href="#using-compression" title="Permalink to this headline"></a></h2>
<p>As of Apache Arrow version 0.17.0, Feather V2 files (the default version)
support two fast compression libraries, LZ4 (using the frame format) and
ZSTD. LZ4 is used by default if it is available (which it should be if you
obtained pyarrow through a normal package manager):</p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="c1"># Uses LZ4 by default</span>
<span class="n">feather</span><span class="o">.</span><span class="n">write_feather</span><span class="p">(</span><span class="n">df</span><span class="p">,</span> <span class="n">file_path</span><span class="p">)</span>
<span class="c1"># Use LZ4 explicitly</span>
<span class="n">feather</span><span class="o">.</span><span class="n">write_feather</span><span class="p">(</span><span class="n">df</span><span class="p">,</span> <span class="n">file_path</span><span class="p">,</span> <span class="n">compression</span><span class="o">=</span><span class="s1">&#39;lz4&#39;</span><span class="p">)</span>
<span class="c1"># Use ZSTD</span>
<span class="n">feather</span><span class="o">.</span><span class="n">write_feather</span><span class="p">(</span><span class="n">df</span><span class="p">,</span> <span class="n">file_path</span><span class="p">,</span> <span class="n">compression</span><span class="o">=</span><span class="s1">&#39;zstd&#39;</span><span class="p">)</span>
<span class="c1"># Do not compress</span>
<span class="n">feather</span><span class="o">.</span><span class="n">write_feather</span><span class="p">(</span><span class="n">df</span><span class="p">,</span> <span class="n">file_path</span><span class="p">,</span> <span class="n">compression</span><span class="o">=</span><span class="s1">&#39;uncompressed&#39;</span><span class="p">)</span>
</pre></div>
</div>
<p>Note that the default LZ4 compression generally yields much smaller files
without sacrificing much read or write performance. In some instances,
LZ4-compressed files may be faster to read and write than uncompressed due to
reduced disk IO requirements.</p>
</div>
<div class="section" id="writing-version-1-v1-files">
<h2>Writing Version 1 (V1) Files<a class="headerlink" href="#writing-version-1-v1-files" title="Permalink to this headline"></a></h2>
<p>For compatibility with libraries without support for Version 2 files, you can
write the version 1 format by passing <code class="docutils literal notranslate"><span class="pre">version=1</span></code> to <code class="docutils literal notranslate"><span class="pre">write_feather</span></code>. We
intend to maintain read support for V1 for the foreseeable future.</p>
</div>
</div>
</div>
</div>
<footer>
<div class="rst-footer-buttons" role="navigation" aria-label="footer navigation">
<a href="json.html" class="btn btn-neutral float-right" title="Reading JSON files" accesskey="n" rel="next">Next <span class="fa fa-arrow-circle-right" aria-hidden="true"></span></a>
<a href="csv.html" class="btn btn-neutral float-left" title="Reading CSV files" accesskey="p" rel="prev"><span class="fa fa-arrow-circle-left" aria-hidden="true"></span> Previous</a>
</div>
<hr/>
<div role="contentinfo">
<p>
&#169; Copyright 2016-2019 Apache Software Foundation.
</p>
</div>
Built with <a href="https://www.sphinx-doc.org/">Sphinx</a> using a
<a href="https://github.com/readthedocs/sphinx_rtd_theme">theme</a>
provided by <a href="https://readthedocs.org">Read the Docs</a>.
</footer>
</div>
</div>
</section>
</div>
<script type="text/javascript">
jQuery(function () {
SphinxRtdTheme.Navigation.enable(true);
});
</script>
<script type="text/javascript" src="/docs/_static/versionwarning.js"></script></body>
</html>