blob: 46f9d605f8b6590356a8af234b1facc216abe3cd [file] [log] [blame]
<!DOCTYPE html>
<html class="writer-html5" lang="en" >
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Fuzzing Arrow C++ &mdash; Apache Arrow v2.0.0</title>
<link rel="stylesheet" href="../../_static/css/theme.css" type="text/css" />
<link rel="stylesheet" href="../../_static/pygments.css" type="text/css" />
<!--[if lt IE 9]>
<script src="../../_static/js/html5shiv.min.js"></script>
<![endif]-->
<script type="text/javascript" id="documentation_options" data-url_root="../../" src="../../_static/documentation_options.js"></script>
<script src="../../_static/jquery.js"></script>
<script src="../../_static/underscore.js"></script>
<script src="../../_static/doctools.js"></script>
<script src="../../_static/language_data.js"></script>
<script type="text/javascript" src="../../_static/js/theme.js"></script>
<link rel="canonical" href="https://arrow.apache.org/docs/developers/cpp/fuzzing.html" />
<link rel="index" title="Index" href="../../genindex.html" />
<link rel="search" title="Search" href="../../search.html" />
<link rel="next" title="Python Development" href="../python.html" />
<link rel="prev" title="Conventions" href="conventions.html" />
<!-- Matomo -->
<script>
var _paq = window._paq = window._paq || [];
/* tracker methods like "setCustomDimension" should be called before "trackPageView" */
_paq.push(["setDoNotTrack", true]);
_paq.push(["disableCookies"]);
_paq.push(['trackPageView']);
_paq.push(['enableLinkTracking']);
(function() {
var u="https://analytics.apache.org/";
_paq.push(['setTrackerUrl', u+'matomo.php']);
_paq.push(['setSiteId', '20']);
var d=document, g=d.createElement('script'), s=d.getElementsByTagName('script')[0];
g.async=true; g.src=u+'matomo.js'; s.parentNode.insertBefore(g,s);
})();
</script>
<!-- End Matomo Code -->
</head>
<body class="wy-body-for-nav">
<div class="wy-grid-for-nav">
<nav data-toggle="wy-nav-shift" class="wy-nav-side">
<div class="wy-side-scroll">
<div class="wy-side-nav-search" >
<a href="../../index.html" class="icon icon-home" alt="Documentation Home"> Apache Arrow
</a>
<div class="version">
2.0.0
</div>
<div role="search">
<form id="rtd-search-form" class="wy-form" action="../../search.html" method="get">
<input type="text" name="q" placeholder="Search docs" />
<input type="hidden" name="check_keywords" value="yes" />
<input type="hidden" name="area" value="default" />
</form>
</div>
</div>
<div class="wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="main navigation">
<p class="caption"><span class="caption-text">Specifications and Protocols</span></p>
<ul>
<li class="toctree-l1"><a class="reference internal" href="../../format/Versioning.html">Format Versioning and Stability</a></li>
<li class="toctree-l1"><a class="reference internal" href="../../format/Columnar.html">Arrow Columnar Format</a></li>
<li class="toctree-l1"><a class="reference internal" href="../../format/Flight.html">Arrow Flight RPC</a></li>
<li class="toctree-l1"><a class="reference internal" href="../../format/Integration.html">Integration Testing</a></li>
<li class="toctree-l1"><a class="reference internal" href="../../format/CDataInterface.html">The Arrow C data interface</a></li>
<li class="toctree-l1"><a class="reference internal" href="../../format/CStreamInterface.html">The Arrow C stream interface</a></li>
<li class="toctree-l1"><a class="reference internal" href="../../format/Other.html">Other Data Structures</a></li>
</ul>
<p class="caption"><span class="caption-text">Libraries</span></p>
<ul>
<li class="toctree-l1"><a class="reference internal" href="../../status.html">Implementation Status</a></li>
<li class="toctree-l1"><a class="reference external" href="https://arrow.apache.org/docs/c_glib/">C/GLib</a></li>
<li class="toctree-l1"><a class="reference internal" href="../../cpp/index.html">C++</a></li>
<li class="toctree-l1"><a class="reference external" href="https://github.com/apache/arrow/blob/master/csharp/README.md">C#</a></li>
<li class="toctree-l1"><a class="reference external" href="https://godoc.org/github.com/apache/arrow/go/arrow">Go</a></li>
<li class="toctree-l1"><a class="reference internal" href="../../java/index.html">Java</a></li>
<li class="toctree-l1"><a class="reference external" href="https://arrow.apache.org/docs/js/">JavaScript</a></li>
<li class="toctree-l1"><a class="reference external" href="https://github.com/apache/arrow/blob/master/matlab/README.md">MATLAB</a></li>
<li class="toctree-l1"><a class="reference internal" href="../../python/index.html">Python</a></li>
<li class="toctree-l1"><a class="reference external" href="https://arrow.apache.org/docs/r/">R</a></li>
<li class="toctree-l1"><a class="reference external" href="https://github.com/apache/arrow/blob/master/ruby/README.md">Ruby</a></li>
<li class="toctree-l1"><a class="reference external" href="https://docs.rs/crate/arrow/">Rust</a></li>
</ul>
<p class="caption"><span class="caption-text">Development</span></p>
<ul class="current">
<li class="toctree-l1"><a class="reference internal" href="../contributing.html">Contributing to Apache Arrow</a></li>
<li class="toctree-l1 current"><a class="reference internal" href="index.html">C++ Development</a><ul class="current">
<li class="toctree-l2"><a class="reference internal" href="building.html">Building Arrow C++</a></li>
<li class="toctree-l2"><a class="reference internal" href="development.html">Development Guidelines</a></li>
<li class="toctree-l2"><a class="reference internal" href="windows.html">Developing on Windows</a></li>
<li class="toctree-l2"><a class="reference internal" href="conventions.html">Conventions</a></li>
<li class="toctree-l2 current"><a class="current reference internal" href="#">Fuzzing Arrow C++</a><ul>
<li class="toctree-l3"><a class="reference internal" href="#fuzz-targets-and-utilities">Fuzz Targets and Utilities</a><ul>
<li class="toctree-l4"><a class="reference internal" href="#generating-the-seed-corpus">Generating the seed corpus</a></li>
</ul>
</li>
<li class="toctree-l3"><a class="reference internal" href="#continuous-fuzzing-infrastructure">Continuous fuzzing infrastructure</a></li>
<li class="toctree-l3"><a class="reference internal" href="#reproducing-locally">Reproducing locally</a></li>
</ul>
</li>
</ul>
</li>
<li class="toctree-l1"><a class="reference internal" href="../python.html">Python Development</a></li>
<li class="toctree-l1"><a class="reference internal" href="../archery.html">Daily Development using Archery</a></li>
<li class="toctree-l1"><a class="reference internal" href="../crossbow.html">Packaging and Testing with Crossbow</a></li>
<li class="toctree-l1"><a class="reference internal" href="../docker.html">Running Docker Builds</a></li>
<li class="toctree-l1"><a class="reference internal" href="../benchmarks.html">Benchmarks</a></li>
<li class="toctree-l1"><a class="reference internal" href="../documentation.html">Building the Documentation</a></li>
</ul>
</div>
</div>
</nav>
<section data-toggle="wy-nav-shift" class="wy-nav-content-wrap">
<nav class="wy-nav-top" aria-label="top navigation">
<i data-toggle="wy-nav-top" class="fa fa-bars"></i>
<a href="../../index.html">Apache Arrow</a>
</nav>
<div class="wy-nav-content">
<div class="rst-content">
<div role="navigation" aria-label="breadcrumbs navigation">
<ul class="wy-breadcrumbs">
<li><a href="../../index.html" class="icon icon-home"></a> &raquo;</li>
<li><a href="index.html">C++ Development</a> &raquo;</li>
<li>Fuzzing Arrow C++</li>
<li class="wy-breadcrumbs-aside">
<a href="../../_sources/developers/cpp/fuzzing.rst.txt" rel="nofollow"> View page source</a>
</li>
</ul>
<hr/>
</div>
<div role="main" class="document" itemscope="itemscope" itemtype="http://schema.org/Article">
<div itemprop="articleBody">
<div class="section" id="fuzzing-arrow-c">
<h1>Fuzzing Arrow C++<a class="headerlink" href="#fuzzing-arrow-c" title="Permalink to this headline"></a></h1>
<p>To make the handling of invalid input more robust, we have enabled
fuzz testing on several parts of the Arrow C++ feature set, currently:</p>
<ul class="simple">
<li><p>the IPC stream format</p></li>
<li><p>the IPC file format</p></li>
<li><p>the Parquet file format</p></li>
</ul>
<p>We welcome any contribution to expand the scope of fuzz testing and cover
areas ingesting potentially invalid or malicious data.</p>
<div class="section" id="fuzz-targets-and-utilities">
<h2>Fuzz Targets and Utilities<a class="headerlink" href="#fuzz-targets-and-utilities" title="Permalink to this headline"></a></h2>
<p>By passing the <code class="docutils literal notranslate"><span class="pre">-DARROW_FUZZING=ON</span></code> CMake option, you will build
the fuzz targets corresponding to the aforementioned Arrow features, as well
as additional related utilities.</p>
<div class="section" id="generating-the-seed-corpus">
<h3>Generating the seed corpus<a class="headerlink" href="#generating-the-seed-corpus" title="Permalink to this headline"></a></h3>
<p>Fuzzing essentially explores the domain space by randomly mutating previously
tested inputs, without having any high-level understanding of the area being
fuzz-tested. However, the domain space is so huge that this strategy alone
may fail to actually produce any “interesting” inputs.</p>
<p>To guide the process, it is therefore important to provide a <em>seed corpus</em>
of valid (or invalid, but remarkable) inputs from which the fuzzing
infrastructure can derive new inputs for testing. A script is provided
to automate that task. Assuming the fuzzing executables can be found in
<code class="docutils literal notranslate"><span class="pre">build/debug</span></code>, the seed corpus can be generated thusly:</p>
<div class="highlight-shell notranslate"><div class="highlight"><pre><span></span>$ ./build-support/fuzzing/generate_corpuses.sh build/debug
</pre></div>
</div>
</div>
</div>
<div class="section" id="continuous-fuzzing-infrastructure">
<h2>Continuous fuzzing infrastructure<a class="headerlink" href="#continuous-fuzzing-infrastructure" title="Permalink to this headline"></a></h2>
<p>The process of fuzz testing is computationally intensive and therefore
benefits from dedicated computing facilities. Arrow C++ is exercised by
the <a class="reference external" href="https://google.github.io/oss-fuzz/">OSS-Fuzz</a> continuous fuzzing infrastructure operated by Google.</p>
<p>Issues found by OSS-Fuzz are notified and available to a limited set of
<a class="reference external" href="https://github.com/google/oss-fuzz/blob/master/projects/arrow/project.yaml">core developers</a>.
If you are a Arrow core developer and want to be added to that list, you can
ask on the <a class="reference internal" href="../contributing.html#contributing"><span class="std std-ref">mailing-list</span></a>.</p>
</div>
<div class="section" id="reproducing-locally">
<h2>Reproducing locally<a class="headerlink" href="#reproducing-locally" title="Permalink to this headline"></a></h2>
<p>When a crash is found by fuzzing, it is often useful to download the data
used to produce the crash, and use it to reproduce the crash so as to debug
and investigate.</p>
<p>Assuming you are in a subdirectory inside <code class="docutils literal notranslate"><span class="pre">cpp</span></code>, the following command
would allow you to build the fuzz targets with debug information and the
various sanitizer checks enabled.</p>
<div class="highlight-shell notranslate"><div class="highlight"><pre><span></span>$ cmake .. -GNinja <span class="se">\</span>
-DCMAKE_BUILD_TYPE<span class="o">=</span>Debug <span class="se">\</span>
-DARROW_USE_ASAN<span class="o">=</span>on <span class="se">\</span>
-DARROW_USE_UBSAN<span class="o">=</span>on <span class="se">\</span>
-DARROW_FUZZING<span class="o">=</span>on
</pre></div>
</div>
<p>Then, assuming you have downloaded the crashing data file (let’s call it
<code class="docutils literal notranslate"><span class="pre">testcase-arrow-ipc-file-fuzz-123465</span></code>), you can reproduce the crash
by running the affected fuzz target on that file:</p>
<div class="highlight-shell notranslate"><div class="highlight"><pre><span></span>$ build/debug/arrow-ipc-file-fuzz testcase-arrow-ipc-file-fuzz-123465
</pre></div>
</div>
<p>(you may want to run that command under a debugger so as to inspect the
program state more closely)</p>
</div>
</div>
</div>
</div>
<footer>
<div class="rst-footer-buttons" role="navigation" aria-label="footer navigation">
<a href="../python.html" class="btn btn-neutral float-right" title="Python Development" accesskey="n" rel="next">Next <span class="fa fa-arrow-circle-right"></span></a>
<a href="conventions.html" class="btn btn-neutral float-left" title="Conventions" accesskey="p" rel="prev"><span class="fa fa-arrow-circle-left"></span> Previous</a>
</div>
<hr/>
<div role="contentinfo">
<p>
&copy; Copyright 2016-2019 Apache Software Foundation
</p>
</div>
Built with <a href="http://sphinx-doc.org/">Sphinx</a> using a
<a href="https://github.com/rtfd/sphinx_rtd_theme">theme</a>
provided by <a href="https://readthedocs.org">Read the Docs</a>.
</footer>
</div>
</div>
</section>
</div>
<script type="text/javascript">
jQuery(function () {
SphinxRtdTheme.Navigation.enable(true);
});
</script>
<script type="text/javascript" src="/docs/_static/versionwarning.js"></script></body>
</html>