commit | c2f665b5c9c3670145fc131e3f827b4392cd4406 | [log] [tgz] |
---|---|---|
author | Heres, Daniel <danielheres@gmail.com> | Thu Apr 08 18:02:09 2021 -0400 |
committer | Andrew Lamb <andrew@nerdnetworks.org> | Thu Apr 08 18:02:09 2021 -0400 |
tree | da7a0f0e5e809c13a1ab578ba60b66c595e06df2 | |
parent | ac38410ead57c0d3e5fac27bf3d8161774839c97 [diff] |
ARROW-12279: [Rust][DataFusion] Add test for null handling in hash join (ARROW-12266) This PR adds a (ignored) test for https://issues.apache.org/jira/browse/ARROW-12266 ``` SELECT id1, id2 FROM (SELECT null AS id1) t1 LEFT JOIN (SELECT 0 AS id2) t2 ON id1 = id2 ``` current result: ```NULL, NULL``` (should be empty result set) We should filter on nulls beforehand to make this result correct. Probably the best way to go here I think is to add a filter in the logical plan on non-null for inner / left and right joins. This can make things more efficient as the non-null filter can be pushed down which can lead to efficiency gains (making data-set smaller, not having to deal with nullable data in batches, or even entire files could be skipped when they only contain nulls). Closes #9937 from Dandandan/join_null Authored-by: Heres, Daniel <danielheres@gmail.com> Signed-off-by: Andrew Lamb <andrew@nerdnetworks.org>
Apache Arrow is a development platform for in-memory analytics. It contains a set of technologies that enable big data systems to process and move data fast.
Major components of the project include:
Arrow is an Apache Software Foundation project. Learn more at arrow.apache.org.
The reference Arrow libraries contain many distinct software components:
The official Arrow libraries in this repository are in different stages of implementing the Arrow format and related features. See our current feature matrix on git master.
Please read our latest project contribution guide.
Even if you do not plan to contribute to Apache Arrow itself or Arrow integrations in other projects, we'd be happy to have you involved: