commit | ec5934ad8e6aec99f495b094826f69e6df3f6d2b | [log] [tgz] |
---|---|---|
author | Heres, Daniel <danielheres@gmail.com> | Thu Mar 04 13:55:57 2021 -0500 |
committer | Andrew Lamb <andrew@nerdnetworks.org> | Thu Mar 04 13:55:57 2021 -0500 |
tree | 76dcafeee1195a7ce5b58b3c73fe0c0769124e38 | |
parent | 7f62219c7826535edf3b5352954cd006c7516a4e [diff] |
ARROW-11806: [Rust][DataFusion] Optimize join / inner join creation of indices This PR implements two optimizations * Change the way we create an array of indices for an inner join to avoid generating a null bit map. It seems currently not really ergonomic to do this with Arrow without resorting to an iterator (which would be hard to do here). This is around 3% difference * Allow to reuse allocations in `create_hashes` when possible. This is around 2% faster. In total this gives a small (5%) speedup to query 5: This PR: ``` Query 5 iteration 0 took 169.3 ms Query 5 iteration 1 took 156.0 ms Query 5 iteration 2 took 157.5 ms Query 5 iteration 3 took 158.0 ms Query 5 iteration 4 took 157.3 ms Query 5 iteration 5 took 163.4 ms Query 5 iteration 6 took 167.6 ms Query 5 iteration 7 took 171.5 ms Query 5 iteration 8 took 167.4 ms Query 5 iteration 9 took 164.5 ms Query 5 avg time: 163.26 ms ``` Master: ``` Query 5 iteration 0 took 177.6 ms Query 5 iteration 1 took 169.6 ms Query 5 iteration 2 took 171.8 ms Query 5 iteration 3 took 175.1 ms Query 5 iteration 4 took 167.2 ms Query 5 iteration 5 took 171.1 ms Query 5 iteration 6 took 174.2 ms Query 5 iteration 7 took 178.1 ms Query 5 iteration 8 took 167.9 ms Query 5 iteration 9 took 172.0 ms Query 5 avg time: 172.46 ms ``` Closes #9595 from Dandandan/opt_hash_join Authored-by: Heres, Daniel <danielheres@gmail.com> Signed-off-by: Andrew Lamb <andrew@nerdnetworks.org>
Apache Arrow is a development platform for in-memory analytics. It contains a set of technologies that enable big data systems to process and move data fast.
Major components of the project include:
Arrow is an Apache Software Foundation project. Learn more at arrow.apache.org.
The reference Arrow libraries contain many distinct software components:
The official Arrow libraries in this repository are in different stages of implementing the Arrow format and related features. See our current feature matrix on git master.
Please read our latest project contribution guide.
Even if you do not plan to contribute to Apache Arrow itself or Arrow integrations in other projects, we'd be happy to have you involved: