| <!DOCTYPE html> |
| <html lang="en-US"> |
| <head> |
| <meta charset="UTF-8"> |
| <meta http-equiv="X-UA-Compatible" content="IE=edge"> |
| <meta name="viewport" content="width=device-width, initial-scale=1"> |
| <!-- The above meta tags *must* come first in the head; any other head content must come *after* these tags --> |
| |
| <title> February 2022 Rust Apache Arrow and Parquet Highlights | Apache Arrow</title> |
| |
| |
| <!-- Begin Jekyll SEO tag v2.8.0 --> |
| <meta name="generator" content="Jekyll v4.4.1" /> |
| <meta property="og:title" content="February 2022 Rust Apache Arrow and Parquet Highlights" /> |
| <meta name="author" content="pmc" /> |
| <meta property="og:locale" content="en_US" /> |
| <meta name="description" content="The Rust implementation of Apache Arrow has just released version 9.0.2. While a major version of this magnitude may shock some in the Rust community to whom it implies a slow moving 20 year old piece of software, nothing could be further from the truth! With regular and predictable bi-weekly releases, the library continues to evolve rapidly, and 9.0.2 is no exception. Some recent highlights: parquet: async, performance, safety and nested types The parquet 9.0.2 release includes an async reader, a long time requested feature. Using the async reader it is now possible to read only the relevant parts of a parquet file from a networked source such as object storage. Previously the entire file had to be buffered locally. We are hoping to add an async writer in a future release and would love some help. It is also significantly faster to read parquet data (up to 60x in some cases) than with previous versions of the parquet crate. Kudos to tustvold and yordan-pavlov for their contributions in these areas. With 8.0.0 and later, the code that reads and writes RecordBatches to and from Parquet now supports all types, including deeply nested structs and lists. Thanks helgikrs for cleaning up the last corner cases! Other notable recent additions to parquet are UTF-8 validation on string data for improved security against malicious inputs. Planned upcoming work includes pushing more filtering directly into the parquet scan as well as an async writer. arrow: performance, dyn kernels, and DecimalArray The compute kernels have been improved significantly in arrow 9.0.2. Some filter benchmarks are twice as fast and the SIMD kernels are also significantly faster. Many thanks to tustvold and jhorstmann. Additional substantial improvements are likely to land in arrow 10.0.0. We are working on new set of "dynamic" dyn_ kernels (for example, eq_dyn) that make it easier to invoke the heavily optimized kernels provided by the arrow crate. Work is underway to expand the breadth of types supported by these new kernels to make them even more useful. Thanks to matthewmturner and viirya for their help in this effort. While arrow has had basic support for DecimalArray since version 3.0.0, support has been expanded for Decimal type in calculation kernels such as sort, take and filter thanks to some great contributions from liukun4515. There is ongoing work to improve the API ergonomics and performance of DecimalArray as well. Security The 6.4.0 release resolved the last outstanding RUSTSEC advisory on the arrow crate and the 8.0.0 release resolved the last outstanding known security issues. While these security issues were mostly limited misuse of the low level "power user" APIs which most users do not (and should not) be using, it was good to tighten up that area. Now that arrow-rs is releasing major versions every other week, we are also able to update dependencies at the same pace, helping to ensure that security fixes upstream can flow more quickly to downstream projects. Final shoutout It takes a community to build great software, and we would like to thank everyone who has contributed to the arrow-rs repository since the 7.0.0 release: git shortlog -sn 7.0.0..9.0.0 22 Raphael Taylor-Davies 18 Andrew Lamb 6 Helgi Kristvin Sigurbjarnarson 6 Remzi Yang 5 Jörn Horstmann 4 Liang-Chi Hsieh 3 Jiayu Liu 2 dependabot[bot] 2 Yijie Shen 1 Matthew Turner 1 Kun Liu 1 Yang 1 Edd Robinson 1 Patrick More How to Get Involved If you are interested in contributing to the Rust subproject in Apache Arrow, you can find a list of open issues suitable for beginners here and the full list here. Other ways to get involved include trying out Arrow on some of your data and filing bug reports, and helping to improve the documentation." /> |
| <meta property="og:description" content="The Rust implementation of Apache Arrow has just released version 9.0.2. While a major version of this magnitude may shock some in the Rust community to whom it implies a slow moving 20 year old piece of software, nothing could be further from the truth! With regular and predictable bi-weekly releases, the library continues to evolve rapidly, and 9.0.2 is no exception. Some recent highlights: parquet: async, performance, safety and nested types The parquet 9.0.2 release includes an async reader, a long time requested feature. Using the async reader it is now possible to read only the relevant parts of a parquet file from a networked source such as object storage. Previously the entire file had to be buffered locally. We are hoping to add an async writer in a future release and would love some help. It is also significantly faster to read parquet data (up to 60x in some cases) than with previous versions of the parquet crate. Kudos to tustvold and yordan-pavlov for their contributions in these areas. With 8.0.0 and later, the code that reads and writes RecordBatches to and from Parquet now supports all types, including deeply nested structs and lists. Thanks helgikrs for cleaning up the last corner cases! Other notable recent additions to parquet are UTF-8 validation on string data for improved security against malicious inputs. Planned upcoming work includes pushing more filtering directly into the parquet scan as well as an async writer. arrow: performance, dyn kernels, and DecimalArray The compute kernels have been improved significantly in arrow 9.0.2. Some filter benchmarks are twice as fast and the SIMD kernels are also significantly faster. Many thanks to tustvold and jhorstmann. Additional substantial improvements are likely to land in arrow 10.0.0. We are working on new set of "dynamic" dyn_ kernels (for example, eq_dyn) that make it easier to invoke the heavily optimized kernels provided by the arrow crate. Work is underway to expand the breadth of types supported by these new kernels to make them even more useful. Thanks to matthewmturner and viirya for their help in this effort. While arrow has had basic support for DecimalArray since version 3.0.0, support has been expanded for Decimal type in calculation kernels such as sort, take and filter thanks to some great contributions from liukun4515. There is ongoing work to improve the API ergonomics and performance of DecimalArray as well. Security The 6.4.0 release resolved the last outstanding RUSTSEC advisory on the arrow crate and the 8.0.0 release resolved the last outstanding known security issues. While these security issues were mostly limited misuse of the low level "power user" APIs which most users do not (and should not) be using, it was good to tighten up that area. Now that arrow-rs is releasing major versions every other week, we are also able to update dependencies at the same pace, helping to ensure that security fixes upstream can flow more quickly to downstream projects. Final shoutout It takes a community to build great software, and we would like to thank everyone who has contributed to the arrow-rs repository since the 7.0.0 release: git shortlog -sn 7.0.0..9.0.0 22 Raphael Taylor-Davies 18 Andrew Lamb 6 Helgi Kristvin Sigurbjarnarson 6 Remzi Yang 5 Jörn Horstmann 4 Liang-Chi Hsieh 3 Jiayu Liu 2 dependabot[bot] 2 Yijie Shen 1 Matthew Turner 1 Kun Liu 1 Yang 1 Edd Robinson 1 Patrick More How to Get Involved If you are interested in contributing to the Rust subproject in Apache Arrow, you can find a list of open issues suitable for beginners here and the full list here. Other ways to get involved include trying out Arrow on some of your data and filing bug reports, and helping to improve the documentation." /> |
| <link rel="canonical" href="https://arrow.apache.org/blog/2022/02/13/rust-9.0/" /> |
| <meta property="og:url" content="https://arrow.apache.org/blog/2022/02/13/rust-9.0/" /> |
| <meta property="og:site_name" content="Apache Arrow" /> |
| <meta property="og:image" content="https://arrow.apache.org/img/arrow-logo_horizontal_black-txt_white-bg.png" /> |
| <meta property="og:type" content="article" /> |
| <meta property="article:published_time" content="2022-02-13T01:00:00-05:00" /> |
| <meta name="twitter:card" content="summary_large_image" /> |
| <meta property="twitter:image" content="https://arrow.apache.org/img/arrow-logo_horizontal_black-txt_white-bg.png" /> |
| <meta property="twitter:title" content="February 2022 Rust Apache Arrow and Parquet Highlights" /> |
| <script type="application/ld+json"> |
| {"@context":"https://schema.org","@type":"BlogPosting","author":{"@type":"Person","name":"pmc"},"dateModified":"2022-02-13T01:00:00-05:00","datePublished":"2022-02-13T01:00:00-05:00","description":"The Rust implementation of Apache Arrow has just released version 9.0.2. While a major version of this magnitude may shock some in the Rust community to whom it implies a slow moving 20 year old piece of software, nothing could be further from the truth! With regular and predictable bi-weekly releases, the library continues to evolve rapidly, and 9.0.2 is no exception. Some recent highlights: parquet: async, performance, safety and nested types The parquet 9.0.2 release includes an async reader, a long time requested feature. Using the async reader it is now possible to read only the relevant parts of a parquet file from a networked source such as object storage. Previously the entire file had to be buffered locally. We are hoping to add an async writer in a future release and would love some help. It is also significantly faster to read parquet data (up to 60x in some cases) than with previous versions of the parquet crate. Kudos to tustvold and yordan-pavlov for their contributions in these areas. With 8.0.0 and later, the code that reads and writes RecordBatches to and from Parquet now supports all types, including deeply nested structs and lists. Thanks helgikrs for cleaning up the last corner cases! Other notable recent additions to parquet are UTF-8 validation on string data for improved security against malicious inputs. Planned upcoming work includes pushing more filtering directly into the parquet scan as well as an async writer. arrow: performance, dyn kernels, and DecimalArray The compute kernels have been improved significantly in arrow 9.0.2. Some filter benchmarks are twice as fast and the SIMD kernels are also significantly faster. Many thanks to tustvold and jhorstmann. Additional substantial improvements are likely to land in arrow 10.0.0. We are working on new set of "dynamic" dyn_ kernels (for example, eq_dyn) that make it easier to invoke the heavily optimized kernels provided by the arrow crate. Work is underway to expand the breadth of types supported by these new kernels to make them even more useful. Thanks to matthewmturner and viirya for their help in this effort. While arrow has had basic support for DecimalArray since version 3.0.0, support has been expanded for Decimal type in calculation kernels such as sort, take and filter thanks to some great contributions from liukun4515. There is ongoing work to improve the API ergonomics and performance of DecimalArray as well. Security The 6.4.0 release resolved the last outstanding RUSTSEC advisory on the arrow crate and the 8.0.0 release resolved the last outstanding known security issues. While these security issues were mostly limited misuse of the low level "power user" APIs which most users do not (and should not) be using, it was good to tighten up that area. Now that arrow-rs is releasing major versions every other week, we are also able to update dependencies at the same pace, helping to ensure that security fixes upstream can flow more quickly to downstream projects. Final shoutout It takes a community to build great software, and we would like to thank everyone who has contributed to the arrow-rs repository since the 7.0.0 release: git shortlog -sn 7.0.0..9.0.0 22 Raphael Taylor-Davies 18 Andrew Lamb 6 Helgi Kristvin Sigurbjarnarson 6 Remzi Yang 5 Jörn Horstmann 4 Liang-Chi Hsieh 3 Jiayu Liu 2 dependabot[bot] 2 Yijie Shen 1 Matthew Turner 1 Kun Liu 1 Yang 1 Edd Robinson 1 Patrick More How to Get Involved If you are interested in contributing to the Rust subproject in Apache Arrow, you can find a list of open issues suitable for beginners here and the full list here. Other ways to get involved include trying out Arrow on some of your data and filing bug reports, and helping to improve the documentation.","headline":"February 2022 Rust Apache Arrow and Parquet Highlights","image":"https://arrow.apache.org/img/arrow-logo_horizontal_black-txt_white-bg.png","mainEntityOfPage":{"@type":"WebPage","@id":"https://arrow.apache.org/blog/2022/02/13/rust-9.0/"},"publisher":{"@type":"Organization","logo":{"@type":"ImageObject","url":"https://arrow.apache.org/img/logo.png"},"name":"pmc"},"url":"https://arrow.apache.org/blog/2022/02/13/rust-9.0/"}</script> |
| <!-- End Jekyll SEO tag --> |
| |
| |
| <!-- favicons --> |
| <link rel="icon" type="image/png" sizes="16x16" href="/img/favicon-16x16.png" id="light1"> |
| <link rel="icon" type="image/png" sizes="32x32" href="/img/favicon-32x32.png" id="light2"> |
| <link rel="apple-touch-icon" type="image/png" sizes="180x180" href="/img/apple-touch-icon.png" id="light3"> |
| <link rel="apple-touch-icon" type="image/png" sizes="120x120" href="/img/apple-touch-icon-120x120.png" id="light4"> |
| <link rel="apple-touch-icon" type="image/png" sizes="76x76" href="/img/apple-touch-icon-76x76.png" id="light5"> |
| <link rel="apple-touch-icon" type="image/png" sizes="60x60" href="/img/apple-touch-icon-60x60.png" id="light6"> |
| <!-- dark mode favicons --> |
| <link rel="icon" type="image/png" sizes="16x16" href="/img/favicon-16x16-dark.png" id="dark1"> |
| <link rel="icon" type="image/png" sizes="32x32" href="/img/favicon-32x32-dark.png" id="dark2"> |
| <link rel="apple-touch-icon" type="image/png" sizes="180x180" href="/img/apple-touch-icon-dark.png" id="dark3"> |
| <link rel="apple-touch-icon" type="image/png" sizes="120x120" href="/img/apple-touch-icon-120x120-dark.png" id="dark4"> |
| <link rel="apple-touch-icon" type="image/png" sizes="76x76" href="/img/apple-touch-icon-76x76-dark.png" id="dark5"> |
| <link rel="apple-touch-icon" type="image/png" sizes="60x60" href="/img/apple-touch-icon-60x60-dark.png" id="dark6"> |
| |
| <script> |
| // Switch to the dark-mode favicons if prefers-color-scheme: dark |
| function onUpdate() { |
| light1 = document.querySelector('link#light1'); |
| light2 = document.querySelector('link#light2'); |
| light3 = document.querySelector('link#light3'); |
| light4 = document.querySelector('link#light4'); |
| light5 = document.querySelector('link#light5'); |
| light6 = document.querySelector('link#light6'); |
| |
| dark1 = document.querySelector('link#dark1'); |
| dark2 = document.querySelector('link#dark2'); |
| dark3 = document.querySelector('link#dark3'); |
| dark4 = document.querySelector('link#dark4'); |
| dark5 = document.querySelector('link#dark5'); |
| dark6 = document.querySelector('link#dark6'); |
| |
| if (matcher.matches) { |
| light1.remove(); |
| light2.remove(); |
| light3.remove(); |
| light4.remove(); |
| light5.remove(); |
| light6.remove(); |
| document.head.append(dark1); |
| document.head.append(dark2); |
| document.head.append(dark3); |
| document.head.append(dark4); |
| document.head.append(dark5); |
| document.head.append(dark6); |
| } else { |
| dark1.remove(); |
| dark2.remove(); |
| dark3.remove(); |
| dark4.remove(); |
| dark5.remove(); |
| dark6.remove(); |
| document.head.append(light1); |
| document.head.append(light2); |
| document.head.append(light3); |
| document.head.append(light4); |
| document.head.append(light5); |
| document.head.append(light6); |
| } |
| } |
| matcher = window.matchMedia('(prefers-color-scheme: dark)'); |
| matcher.addListener(onUpdate); |
| onUpdate(); |
| </script> |
| |
| <link href="/css/main.css" rel="stylesheet"> |
| <link href="/css/syntax.css" rel="stylesheet"> |
| <script src="/javascript/main.js"></script> |
| |
| <!-- Matomo --> |
| <script> |
| var _paq = window._paq = window._paq || []; |
| /* tracker methods like "setCustomDimension" should be called before "trackPageView" */ |
| /* We explicitly disable cookie tracking to avoid privacy issues */ |
| _paq.push(['disableCookies']); |
| _paq.push(['trackPageView']); |
| _paq.push(['enableLinkTracking']); |
| (function() { |
| var u="https://analytics.apache.org/"; |
| _paq.push(['setTrackerUrl', u+'matomo.php']); |
| _paq.push(['setSiteId', '20']); |
| var d=document, g=d.createElement('script'), s=d.getElementsByTagName('script')[0]; |
| g.async=true; g.src=u+'matomo.js'; s.parentNode.insertBefore(g,s); |
| })(); |
| </script> |
| <!-- End Matomo Code --> |
| |
| |
| <link type="application/atom+xml" rel="alternate" href="https://arrow.apache.org/feed.xml" title="Apache Arrow" /> |
| </head> |
| |
| |
| <body class="wrap"> |
| <header> |
| <nav class="navbar navbar-expand-md navbar-dark bg-dark"> |
| |
| <a class="navbar-brand no-padding" href="/"><img src="/img/arrow-inverse-300px.png" height="40px"></a> |
| |
| <button class="navbar-toggler ml-auto" type="button" data-toggle="collapse" data-target="#arrow-navbar" aria-controls="arrow-navbar" aria-expanded="false" aria-label="Toggle navigation"> |
| <span class="navbar-toggler-icon"></span> |
| </button> |
| |
| <!-- Collect the nav links, forms, and other content for toggling --> |
| <div class="collapse navbar-collapse justify-content-end" id="arrow-navbar"> |
| <ul class="nav navbar-nav"> |
| <li class="nav-item"><a class="nav-link" href="/overview/" role="button" aria-haspopup="true" aria-expanded="false">Overview</a></li> |
| <li class="nav-item"><a class="nav-link" href="/faq/" role="button" aria-haspopup="true" aria-expanded="false">FAQ</a></li> |
| <li class="nav-item"><a class="nav-link" href="/blog" role="button" aria-haspopup="true" aria-expanded="false">Blog</a></li> |
| <li class="nav-item dropdown"> |
| <a class="nav-link dropdown-toggle" href="#" id="navbarDropdownGetArrow" role="button" data-toggle="dropdown" aria-haspopup="true" aria-expanded="false"> |
| Get Arrow |
| </a> |
| <div class="dropdown-menu" aria-labelledby="navbarDropdownGetArrow"> |
| <a class="dropdown-item" href="/install/">Install</a> |
| <a class="dropdown-item" href="/release/">Releases</a> |
| </div> |
| </li> |
| <li class="nav-item dropdown"> |
| <a class="nav-link dropdown-toggle" href="#" id="navbarDropdownDocumentation" role="button" data-toggle="dropdown" aria-haspopup="true" aria-expanded="false"> |
| Docs |
| </a> |
| <div class="dropdown-menu" aria-labelledby="navbarDropdownDocumentation"> |
| <a class="dropdown-item" href="/docs">Project Docs</a> |
| <a class="dropdown-item" href="/docs/format/Columnar.html">Format</a> |
| <hr> |
| <a class="dropdown-item" href="/docs/c_glib">C GLib</a> |
| <a class="dropdown-item" href="/docs/cpp">C++</a> |
| <a class="dropdown-item" href="https://github.com/apache/arrow/blob/main/csharp/README.md" target="_blank" rel="noopener">C#</a> |
| <a class="dropdown-item" href="https://godoc.org/github.com/apache/arrow/go/arrow" target="_blank" rel="noopener">Go</a> |
| <a class="dropdown-item" href="/docs/java">Java</a> |
| <a class="dropdown-item" href="/docs/js">JavaScript</a> |
| <a class="dropdown-item" href="/julia/">Julia</a> |
| <a class="dropdown-item" href="https://github.com/apache/arrow/blob/main/matlab/README.md" target="_blank" rel="noopener">MATLAB</a> |
| <a class="dropdown-item" href="/docs/python">Python</a> |
| <a class="dropdown-item" href="/docs/r">R</a> |
| <a class="dropdown-item" href="https://github.com/apache/arrow/blob/main/ruby/README.md" target="_blank" rel="noopener">Ruby</a> |
| <a class="dropdown-item" href="https://docs.rs/arrow/latest" target="_blank" rel="noopener">Rust</a> |
| <a class="dropdown-item" href="/swift">Swift</a> |
| </div> |
| </li> |
| <li class="nav-item dropdown"> |
| <a class="nav-link dropdown-toggle" href="#" id="navbarDropdownSource" role="button" data-toggle="dropdown" aria-haspopup="true" aria-expanded="false"> |
| Source |
| </a> |
| <div class="dropdown-menu" aria-labelledby="navbarDropdownSource"> |
| <a class="dropdown-item" href="https://github.com/apache/arrow" target="_blank" rel="noopener">Main Repo</a> |
| <hr> |
| <a class="dropdown-item" href="https://github.com/apache/arrow/tree/main/c_glib" target="_blank" rel="noopener">C GLib</a> |
| <a class="dropdown-item" href="https://github.com/apache/arrow/tree/main/cpp" target="_blank" rel="noopener">C++</a> |
| <a class="dropdown-item" href="https://github.com/apache/arrow/tree/main/csharp" target="_blank" rel="noopener">C#</a> |
| <a class="dropdown-item" href="https://github.com/apache/arrow-go" target="_blank" rel="noopener">Go</a> |
| <a class="dropdown-item" href="https://github.com/apache/arrow-java" target="_blank" rel="noopener">Java</a> |
| <a class="dropdown-item" href="https://github.com/apache/arrow-js" target="_blank" rel="noopener">JavaScript</a> |
| <a class="dropdown-item" href="https://github.com/apache/arrow-julia" target="_blank" rel="noopener">Julia</a> |
| <a class="dropdown-item" href="https://github.com/apache/arrow/tree/main/matlab" target="_blank" rel="noopener">MATLAB</a> |
| <a class="dropdown-item" href="https://github.com/apache/arrow/tree/main/python" target="_blank" rel="noopener">Python</a> |
| <a class="dropdown-item" href="https://github.com/apache/arrow/tree/main/r" target="_blank" rel="noopener">R</a> |
| <a class="dropdown-item" href="https://github.com/apache/arrow/tree/main/ruby" target="_blank" rel="noopener">Ruby</a> |
| <a class="dropdown-item" href="https://github.com/apache/arrow-rs" target="_blank" rel="noopener">Rust</a> |
| <a class="dropdown-item" href="https://github.com/apache/arrow-swift" target="_blank" rel="noopener">Swift</a> |
| </div> |
| </li> |
| <li class="nav-item dropdown"> |
| <a class="nav-link dropdown-toggle" href="#" id="navbarDropdownSubprojects" role="button" data-toggle="dropdown" aria-haspopup="true" aria-expanded="false"> |
| Subprojects |
| </a> |
| <div class="dropdown-menu" aria-labelledby="navbarDropdownSubprojects"> |
| <a class="dropdown-item" href="/adbc">ADBC</a> |
| <a class="dropdown-item" href="/docs/format/Flight.html">Arrow Flight</a> |
| <a class="dropdown-item" href="/docs/format/FlightSql.html">Arrow Flight SQL</a> |
| <a class="dropdown-item" href="https://datafusion.apache.org" target="_blank" rel="noopener">DataFusion</a> |
| <a class="dropdown-item" href="/nanoarrow">nanoarrow</a> |
| </div> |
| </li> |
| <li class="nav-item dropdown"> |
| <a class="nav-link dropdown-toggle" href="#" id="navbarDropdownCommunity" role="button" data-toggle="dropdown" aria-haspopup="true" aria-expanded="false"> |
| Community |
| </a> |
| <div class="dropdown-menu" aria-labelledby="navbarDropdownCommunity"> |
| <a class="dropdown-item" href="/community/">Communication</a> |
| <a class="dropdown-item" href="/docs/developers/index.html">Contributing</a> |
| <a class="dropdown-item" href="https://github.com/apache/arrow/issues" target="_blank" rel="noopener">Issue Tracker</a> |
| <a class="dropdown-item" href="/committers/">Governance</a> |
| <a class="dropdown-item" href="/use_cases/">Use Cases</a> |
| <a class="dropdown-item" href="/powered_by/">Powered By</a> |
| <a class="dropdown-item" href="/visual_identity/">Visual Identity</a> |
| <a class="dropdown-item" href="/security/">Security</a> |
| <a class="dropdown-item" href="https://www.apache.org/foundation/policies/conduct.html" target="_blank" rel="noopener">Code of Conduct</a> |
| </div> |
| </li> |
| <li class="nav-item dropdown"> |
| <a class="nav-link dropdown-toggle" href="#" id="navbarDropdownASF" role="button" data-toggle="dropdown" aria-haspopup="true" aria-expanded="false"> |
| ASF Links |
| </a> |
| <div class="dropdown-menu dropdown-menu-right" aria-labelledby="navbarDropdownASF"> |
| <a class="dropdown-item" href="https://www.apache.org/" target="_blank" rel="noopener">ASF Website</a> |
| <a class="dropdown-item" href="https://www.apache.org/licenses/" target="_blank" rel="noopener">License</a> |
| <a class="dropdown-item" href="https://www.apache.org/foundation/sponsorship.html" target="_blank" rel="noopener">Donate</a> |
| <a class="dropdown-item" href="https://www.apache.org/foundation/thanks.html" target="_blank" rel="noopener">Thanks</a> |
| <a class="dropdown-item" href="https://www.apache.org/security/" target="_blank" rel="noopener">Security</a> |
| </div> |
| </li> |
| </ul> |
| </div> |
| <!-- /.navbar-collapse --> |
| </nav> |
| |
| </header> |
| |
| <div class="container p-4 pt-5"> |
| <div class="col-md-8 mx-auto"> |
| <main role="main" class="pb-5"> |
| |
| <h1> |
| February 2022 Rust Apache Arrow and Parquet Highlights |
| </h1> |
| <hr class="mt-4 mb-3"> |
| |
| |
| |
| <p class="mb-4 pb-1"> |
| <span class="badge badge-secondary">Published</span> |
| <span class="published mr-3"> |
| 13 Feb 2022 |
| </span> |
| <br> |
| <span class="badge badge-secondary">By</span> |
| |
| <a class="mr-3" href="https://arrow.apache.org">The Apache Arrow PMC (pmc) </a> |
| |
| |
| |
| </p> |
| |
| |
| <!-- |
| |
| --> |
| <p>The Rust implementation of <a href="https://arrow.apache.org/">Apache Arrow</a> has just released version <code>9.0.2</code>.</p> |
| <p>While a major version of this magnitude may shock some in the Rust |
| community to whom it implies a slow moving 20 year old piece of |
| software, nothing could be further from the truth!</p> |
| <p>With regular and predictable bi-weekly releases, the library continues |
| to evolve rapidly, and <code>9.0.2</code> is no exception. Some recent highlights:</p> |
| <h1> |
| <code>parquet</code>: async, performance, safety and nested types</h1> |
| <p>The <a href="https://crates.io/crates/arrow/9.0.2" target="_blank" rel="noopener">parquet <code>9.0.2</code></a> release includes an <a href="https://github.com/apache/arrow-rs/blob/9.0.2/parquet/src/arrow/async_reader.rs#L21-L75" target="_blank" rel="noopener"><code>async</code> reader</a>, a long time requested feature. Using the <code>async</code> |
| reader it is now possible to read only the relevant parts of a parquet |
| file from a networked source such as object storage. Previously the |
| entire file had to be buffered locally. We are hoping to add an <code>async</code> |
| writer in a future release and would love some |
| <a href="https://github.com/apache/arrow-rs/issues/1269" target="_blank" rel="noopener">help</a>.</p> |
| <p>It is also significantly faster to read parquet data (up to |
| <a href="https://github.com/apache/arrow-rs/pull/1180#issuecomment-1018518863" target="_blank" rel="noopener">60x</a> |
| in some cases) than with previous versions of the <code>parquet</code> |
| crate. Kudos to <a href="https://github.com/tustvold" target="_blank" rel="noopener">tustvold</a> and |
| <a href="https://github.com/yordan-pavlov" target="_blank" rel="noopener">yordan-pavlov</a> for their |
| contributions in these areas.</p> |
| <p>With <code>8.0.0</code> and later, the code that reads and writes <code>RecordBatch</code>es |
| to and from Parquet now supports all types, including deeply nested |
| structs and lists. Thanks <a href="https://github.com/helgikrs" target="_blank" rel="noopener">helgikrs</a> for |
| cleaning up the last corner cases!</p> |
| <p>Other notable recent additions to parquet are <code>UTF-8</code> validation on |
| string data for improved security against malicious inputs.</p> |
| <p>Planned upcoming work includes <a href="https://github.com/apache/arrow-rs/issues/1191" target="_blank" rel="noopener">pushing more |
| filtering</a> directly |
| into the parquet scan as well as an <code>async</code> writer.</p> |
| <h1> |
| <code>arrow</code>: performance, dyn kernels, and DecimalArray</h1> |
| <p>The <a href="https://docs.rs/arrow/latest/arrow/compute/index.html" target="_blank" rel="noopener">compute</a> |
| kernels have been improved significantly in <a href="https://crates.io/crates/parquet/9.0.2" target="_blank" rel="noopener">arrow <code>9.0.2</code></a>. Some <a href="https://github.com/apache/arrow-rs/pull/1228#issue-1111889246" target="_blank" rel="noopener">filter |
| benchmarks</a> |
| are twice as fast and the SIMD kernels are also <a href="https://github.com/apache/arrow-rs/pull/1221" target="_blank" rel="noopener">significantly |
| faster</a>. Many thanks to |
| <a href="https://github.com/tustvold" target="_blank" rel="noopener">tustvold</a> and |
| <a href="https://github.com/jhorstmann" target="_blank" rel="noopener">jhorstmann</a>. |
| <a href="https://github.com/apache/arrow-rs/pull/1248" target="_blank" rel="noopener">Additional substantial</a> |
| improvements are likely to land in arrow <code>10.0.0</code>.</p> |
| <p>We are working on new set of "dynamic" <code>dyn_</code> kernels (for example, |
| <a href="https://docs.rs/arrow/8.0.0/arrow/compute/kernels/comparison/fn.eq_dyn.html" target="_blank" rel="noopener"><code>eq_dyn</code></a>) |
| that make it easier to invoke the heavily optimized kernels provided |
| by the <code>arrow</code> crate. Work is underway to expand the breadth of types |
| supported by these new kernels to make them even more useful. Thanks |
| to <a href="https://github.com/matthewmturner" target="_blank" rel="noopener">matthewmturner</a> and |
| <a href="https://github.com/viirya" target="_blank" rel="noopener">viirya</a> for their help in this |
| effort.</p> |
| <p>While <code>arrow</code> has had basic support for <code>DecimalArray</code> since version |
| <code>3.0.0</code>, support has been expanded for <code>Decimal</code> type in calculation |
| kernels such as <code>sort</code>, <code>take</code> and <code>filter</code> thanks to some great |
| contributions from <a href="https://github.com/liukun4515" target="_blank" rel="noopener">liukun4515</a>. There |
| is <a href="https://github.com/apache/arrow-rs/pull/1223" target="_blank" rel="noopener">ongoing work</a> to |
| improve the API ergonomics and performance of <code>DecimalArray</code> as well.</p> |
| <h1>Security</h1> |
| <p>The <code>6.4.0</code> release resolved the last outstanding |
| <a href="https://rustsec.org/" target="_blank" rel="noopener">RUSTSEC</a> |
| <a href="https://github.com/rustsec/advisory-db/pull/1131" target="_blank" rel="noopener">advisory</a> on the |
| arrow crate and the <code>8.0.0</code> release resolved the last outstanding |
| known security issues. While these security issues were mostly limited |
| misuse of the low level "power user" APIs which most users do not (and |
| should not) be using, it was good to tighten up that area.</p> |
| <p>Now that <code>arrow-rs</code> is releasing major versions every other week, we |
| are also able to update dependencies at the same pace, helping to |
| ensure that security fixes upstream can flow more quickly to |
| downstream projects.</p> |
| <h1>Final shoutout</h1> |
| <p>It takes a community to build great software, and we would like to |
| thank everyone who has contributed to the arrow-rs repository since |
| the <code>7.0.0</code> release:</p> |
| <div class="language-console highlighter-rouge"><div class="highlight"><pre class="highlight"><code data-lang="console"><span class="go">git shortlog -sn 7.0.0..9.0.0 |
| 22 Raphael Taylor-Davies |
| 18 Andrew Lamb |
| 6 Helgi Kristvin Sigurbjarnarson |
| 6 Remzi Yang |
| 5 Jörn Horstmann |
| 4 Liang-Chi Hsieh |
| 3 Jiayu Liu |
| 2 dependabot[bot] |
| 2 Yijie Shen |
| 1 Matthew Turner |
| 1 Kun Liu |
| 1 Yang |
| 1 Edd Robinson |
| 1 Patrick More |
| </span></code></pre></div></div> |
| <h1>How to Get Involved</h1> |
| <p>If you are interested in contributing to the Rust subproject in Apache Arrow, you can find a list of open issues |
| suitable for beginners <a href="https://github.com/apache/arrow-rs/issues?q=is%3Aissue+is%3Aopen+label%3A%22good+first+issue%22" target="_blank" rel="noopener">here</a> |
| and the full list <a href="https://github.com/apache/arrow-rs/issues" target="_blank" rel="noopener">here</a>.</p> |
| <p>Other ways to get involved include trying out Arrow on some of your data and filing bug reports, and helping to |
| improve the documentation.</p> |
| |
| </main> |
| </div> |
| |
| <hr> |
| <footer class="footer"> |
| <div class="row"> |
| <div class="col-md-9"> |
| <p>Apache Arrow, Arrow, Apache, the Apache logo, and the Apache Arrow project logo are either registered trademarks or trademarks of The Apache Software Foundation in the United States and other countries.</p> |
| <p>© 2016-2025 The Apache Software Foundation</p> |
| </div> |
| <div class="col-md-3"> |
| <a class="d-sm-none d-md-inline pr-2" href="https://www.apache.org/events/current-event.html" target="_blank" rel="noopener"> |
| <img src="https://www.apache.org/events/current-event-234x60.png"> |
| </a> |
| </div> |
| </div> |
| </footer> |
| |
| </div> |
| </body> |
| </html> |