| <!DOCTYPE html> |
| <html> |
| <head> |
| <meta charset="utf-8"> |
| <title>The Apache Crail (Incubating) Project: Crail Python API: Python -> C/C++ call overhead</title> |
| <meta name="viewport" content="width=device-width, initial-scale=1.0"> |
| <link href="https://crail.incubator.apache.org/css/bootstrap.min.css" rel="stylesheet"> |
| <link href="https://crail.incubator.apache.org/css/group.css" rel="stylesheet"> |
| <link rel="alternate" type="application/atom+xml" title="Atom" |
| href="https://crail.incubator.apache.org/blog/blog.xml"> |
| |
| <meta property="og:image" content="https://crail.incubator.apache.org/img/blog/preview/python-summary.png" /> |
| <meta property="og:image:secure_url" content="https://crail.incubator.apache.org/img/blog/preview/python-summary.png" /> |
| </head> |
| |
| <body> |
| <div class="container"> |
| <div class="header"> |
| <ul class="nav nav-pills pull-right"> |
| |
| |
| |
| <li > |
| <a href="https://crail.incubator.apache.org/"> |
| Home |
| </a> |
| </li> |
| |
| |
| <li > |
| <a href="https://crail.incubator.apache.org/overview/"> |
| Overview |
| </a> |
| </li> |
| |
| |
| <li > |
| <a href="https://crail.incubator.apache.org/download/"> |
| Downloads |
| </a> |
| </li> |
| |
| |
| <li > |
| <a href="https://crail.incubator.apache.org/blog/"> |
| Blog |
| </a> |
| </li> |
| |
| |
| <li > |
| <a href="https://crail.incubator.apache.org/community/"> |
| Community |
| </a> |
| </li> |
| |
| |
| <li > |
| <a href="https://crail.incubator.apache.org/documentation/"> |
| Documentation |
| </a> |
| </li> |
| |
| </ul> |
| <a href="https://crail.incubator.apache.org/"> |
| <img src="https://crail.incubator.apache.org/img/crail_logo.png" |
| srcset="https://crail.incubator.apache.org/img/crail_logo.png" |
| alt="Crail" id="logo"> |
| </a> |
| </div> |
| |
| |
| |
| <h2>Crail Python API: Python -> C/C++ call overhead</h2> |
| |
| |
| <p class="meta">22 Jan 2019, </p> |
| |
| <div class="post"> |
| <div style="text-align: justify"> |
| <p> |
| With python being used in many machine learning applications, serverless frameworks, etc. |
| as the go-to language, we believe a Crail client Python API would be a useful tool to |
| broaden the use-case for Crail. |
| Since the Crail core is written in Java, performance has always been a concern due to |
| just-in-time compilation, garbage collection, etc. |
| However with careful engineering (Off heap buffers, stateful verbs calls, ...) |
| we were able to show that Crail can devliever similar or better performance compared |
| to other statically compiled storage systems. So how can we engineer the Python |
| library to deliver the best possible performance? |
| </p> |
| <p> |
| Python's reference implementation, also the most widely-used, CPython has historically |
| always been an interpreter and not a JIT compiler like PyPy. We will focus on |
| CPython since its alternatives are in general not plug-and-play replacements. |
| </p> |
| <p> |
| Crail is client-driven so most of its logic is implemented in the client library. |
| For this reason we do not want to reimplement the client logic for every new |
| language we want to support as it would result in a maintance nightmare. |
| However interfacing with Java is not feasible since it encurs in to much overhead |
| so we decided to implement a C++ client (more on this in a later blog post). |
| The C++ client allows us to use a foreign function interface in Python to call |
| C++ functions directly from Python. |
| </p> |
| </div> |
| |
| <h3 id="options-options-options">Options, Options, Options</h3> |
| |
| <div style="text-align: justify"> |
| <p> |
| There are two high-level concepts of how to integrate (C)Python and C: extension |
| modules and embedding. |
| </p> |
| <p> |
| Embedding Python uses Python as a component in an application. Our aim is to |
| develop a Python API to be used by other Python applications so embeddings are |
| not what we look for. |
| </p> |
| <p> |
| Extension modules are shared libraries that extend the Python interpreter. |
| For this use-case CPython offers a C API to interact with the Python interpreter |
| and allows to define modules, objects and functions in C which can be called |
| from Python. Note that there is also the option to extend the Python interpreter |
| through a Python library like ctypes or cffi. They are generally easier to |
| use and should preserve portability (extension modules are CPython specific). |
| However they do not give as much flexibility as extension modules and incur |
| in potentially more overhead (see below). There are multiple wrapper frameworks |
| available for CPython's C API to ease development of extension modules. |
| Here is an overview of frameworks and libraries we tested: |
| </p> |
| </div> |
| <ul> |
| <li><a href="https://cython.org/"><strong>Cython:</strong></a> optimising static compiler for Python and the Cython programming |
| language (based on Pyrex). C/C++ function, objects, etc. can be directly |
| accessed from Cython. The compiler generates C code from Cython which interfaces |
| with the CPython C-API.</li> |
| <li><a href="http://www.swig.org/"><strong>SWIG:</strong></a> (Simplified Wrapper and Interface Generator) is a tool to connect |
| C/C++ with various high-level languages. C/C++ interfaces that should be available |
| in Python have to be defined in a SWIG interface file. The interface files |
| are compiled to C/C++ wrapper files which interface with the CPython C-API.</li> |
| <li><a href="https://www.boost.org/"><strong>Boost.Python:</strong></a> is a C++ library that wraps CPython’s C-API. It uses |
| advanced metaprogramming techniques to simplify the usage and allows wrapping |
| C++ interfaces non-intrusively.</li> |
| <li><a href="https://docs.python.org/3.7/library/ctypes.html#module-ctypes"><strong>ctypes:</strong></a> |
| is a foreign function library. It allows calling C functions in shared libraries |
| with predefined compatible data types. It does not require writing any glue code |
| and does not interface with the CPython C-API directly.</li> |
| </ul> |
| |
| <h3 id="benchmarks">Benchmarks</h3> |
| |
| <p>In this blog post we focus on the overhead of calling a C/C++ function from Python. |
| We vary the number of arguments, argument types and the return types. We also |
| test passing strings to C/C++ since it is part of the Crail API e.g. when |
| opening or creating a file. Some frameworks expect <code class="language-plaintext highlighter-rouge">bytes</code> when passing a string |
| to a underlying <code class="language-plaintext highlighter-rouge">const char *</code>, some allow to pass a <code class="language-plaintext highlighter-rouge">str</code> and others allow both. |
| If C++ is supported by the framework we also test passing a <code class="language-plaintext highlighter-rouge">std::string</code> to a |
| C++ function. Note that we perform all benchmarks with CPython version 3.5.2. |
| We measure the time it takes to call the Python function until it returns. |
| The C/C++ functions are empty, except a <code class="language-plaintext highlighter-rouge">return</code> statement where necessary.</p> |
| |
| <div style="text-align:center"><img src="https://crail.incubator.apache.org/img/blog/python_c/python_c_foo.svg" width="725" /></div> |
| <p></p> |
| |
| <p>The plot shows that adding more arguments to a function increases runtime. |
| Introducing the first argument increases the runtime the most. Adding a the integer |
| return type only increased runtime slightly.</p> |
| |
| <p>As expected, cytpes as the only test which is not based on extension modules |
| performed the worst. Function call overhead for a function without return value |
| and any arguments is almost 300ns and goes up to 1/2 a microsecond with 4 |
| arguments. Considering that RDMA writes can be performed below 1us this would |
| introduce a major overhead (more on this below in the discussion section).</p> |
| |
| <p>SWIG and Boost.Python show similar performance where Boost is slightly slower and |
| out of the implementations based on extension modules is the slowest. |
| Cython is also based on extension modules so it was a surprise to us that it showed |
| the best performance of all methods tested. Investigating the performance difference |
| between Cython and our extension module implementation we found that Cython makes |
| better use of the C-API.</p> |
| |
| <p>Our extension module implementation follows the official tutorial and uses |
| <code class="language-plaintext highlighter-rouge">PyArg_ParseTuple</code> to parse the arguments. However as shown below we found that |
| manually unpacking the arguments with <code class="language-plaintext highlighter-rouge">PyArg_UnpackTuple</code> already significantly |
| increased the performance. Although these numbers still do not match Cython’s |
| performance we did not further investigate possible optimizations |
| to our code.</p> |
| |
| <div style="text-align:center"><img src="https://crail.incubator.apache.org/img/blog/python_c/python_c_foo_opt.svg" width="725" /></div> |
| <p></p> |
| |
| <p>Let’s take a look at the string performance. <code class="language-plaintext highlighter-rouge">bytes</code> and <code class="language-plaintext highlighter-rouge">str</code> is used whereever |
| applicable. To pass strings as bytes the ‘b’ prefix is used.</p> |
| |
| <div style="text-align:center"><img src="https://crail.incubator.apache.org/img/blog/python_c/python_c_foo_str.svg" width="725" /></div> |
| <p></p> |
| |
| <p>Again Cython and the extension module implementation with manual unpacking seem to |
| deliver the best performance. Passing a 64bit value in form of a <code class="language-plaintext highlighter-rouge">const char *</code> |
| pointer seems to be slightly faster than passing an integer argument (up to 20%). |
| Passing the string to a C++ function which takes a <code class="language-plaintext highlighter-rouge">std::string</code> |
| is ~50% slower than passing a <code class="language-plaintext highlighter-rouge">const char *</code>, probably because of the |
| instantiation of the underlying data buffer and copying however we have not |
| confirmed this.</p> |
| |
| <h3 id="discussion">Discussion</h3> |
| |
| <p>One might think a difference of 100ns should not really matter and you should |
| anyway not call to often into C/C++. However we believe that this is not true |
| when it comes to latency sensitive or high IOPS applications. For example |
| using RDMA one can perform IO operations below a 1us RTT so 100ns is already |
| a 10% performance hit. Also batching operations (to reduce amount of calls to C) |
| is not feasible for low latency operations since it typically incurs in wait |
| time until the batch size is large enough to be posted. Furthermore, even in high |
| IOPS applications batching is not always feasible and might lead to undesired |
| latency increase.</p> |
| |
| <p>Efficient IO is typically performed through an asynchronous |
| interface to allow not having to wait for IO to complete to perform the next |
| operation. Even with an asynchronous interface, not only the latency of the operation |
| is affected but the call overhead also limits the maximum IOPS. For example, |
| in the best case scenario, our async call only takes one pointer as an argument so |
| 100ns call overhead. And say our C library is capable of posting 5 million requests |
| per seconds (and is limited by the speed of posting not the device) that calculates |
| to 200ns per operation. If we introduce a 100ns overhead we limit the IOPS to 3.3 |
| million operations per second which is a 1/3 decrease in performance. This is |
| already significant consider using ctypes for such an operation now we are |
| talking about limiting the throughput by a factor of 3.</p> |
| |
| <p>Besides performance another aspect is the usability of the different approaches. |
| Considering only ease of use <em>ctypes</em> is a clear winner for us. However it only |
| supports to interface with C and is slow. <em>Cython</em>, <em>SWIG</em> and <em>Boost.Python</em> |
| require a similar amount of effort to declare the interfaces, however here |
| <em>Cython</em> clearly wins the performance crown. Writing your own <em>extension module</em> |
| is feasible however as shown above to get the best performance one needs |
| a good understanding of the CPython C-API/internals. From the tested approaches |
| this one requires the most glue code.</p> |
| |
| <h3 id="setup--source-code">Setup & Source Code</h3> |
| |
| <p>All tests were run on the following system:</p> |
| |
| <ul> |
| <li>Intel(R) Core(TM) i7-3770</li> |
| <li>16GB DDR3-1600MHz</li> |
| <li>Ubuntu 16.04 / Linux kernel version 4.4.0-142</li> |
| <li>CPython 3.5.2</li> |
| </ul> |
| |
| <p>The source code is available on <a href="https://github.com/zrlio/Python-c-benchmark">GitHub</a></p> |
| |
| |
| </div> |
| |
| <!-- |
| |
| <div id="disqus_thread"></div> |
| <script> |
| |
| /** |
| * RECOMMENDED CONFIGURATION VARIABLES: EDIT AND UNCOMMENT THE SECTION BELOW TO INSERT DYNAMIC VALUES FROM YOUR PLATFORM OR CMS. |
| * LEARN WHY DEFINING THESE VARIABLES IS IMPORTANT: https://disqus.com/admin/universalcode/#configuration-variables*/ |
| /* |
| var disqus_config = function () { |
| this.page.url = PAGE_URL; // Replace PAGE_URL with your page's canonical URL variable |
| this.page.identifier = PAGE_IDENTIFIER; // Replace PAGE_IDENTIFIER with your page's unique identifier variable |
| }; |
| */ |
| (function() { // DON'T EDIT BELOW THIS LINE |
| var d = document, s = d.createElement('script'); |
| s.src = '//crail-io.disqus.com/embed.js'; |
| s.setAttribute('data-timestamp', +new Date()); |
| (d.head || d.body).appendChild(s); |
| })(); |
| </script> |
| <noscript>Please enable JavaScript to view the <a href="https://disqus.com/?ref_noscript">comments powered by Disqus.</a></noscript> |
| |
| --> |
| |
| |
| <br> |
| <br> |
| <div class="footer"> |
| <p>Apache Crail is an effort undergoing <a href="https://incubator.apache.org/">incubation</a> at <a href="https://www.apache.org/">The Apache Software Foundation (ASF)</a>, sponsored by the Apache Incubator PMC. Incubation is required of all newly accepted projects until a further review indicates that the infrastructure, communications, and decision making process have stabilized in a manner consistent with other successful ASF projects. While incubation status is not necessarily a reflection of the completeness or stability of the code, it does indicate that the project has yet to be fully endorsed by the ASF. |
| </p> |
| </div> |
| |
| </div> <!-- /container --> |
| |
| <!-- Support retina images. --> |
| <script type="text/javascript" |
| src="https://crail.incubator.apache.org/js/srcset-polyfill.js"></script> |
| </body> |
| </html> |