IMPALA-7171: [DOCS] Hints for Kudu insert and upsert
Change-Id: I04378e6f2b17d4d6e844192807d946b9045e2927
Reviewed-on: http://gerrit.cloudera.org:8080/10737
Reviewed-by: Thomas Marshall <thomasmarshall@cmu.edu>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
diff --git a/docs/shared/impala_common.xml b/docs/shared/impala_common.xml
index 158f68a..6faa9c1 100644
--- a/docs/shared/impala_common.xml
+++ b/docs/shared/impala_common.xml
@@ -3651,11 +3651,14 @@
</p>
<note type="warning" id="impala_kerberos_ssl_caveat">
- Prior to <keyword keyref="impala232"/>, you could enable Kerberos authentication between Impala internal components,
- or SSL encryption between Impala internal components, but not both at the same time.
- This restriction has now been lifted.
- See <xref keyref="IMPALA-2598">IMPALA-2598</xref>
- to see the maintenance releases for different levels of Impala where the fix has been published.
+ In <keyword
+ keyref="impala231"> </keyword> and lower versions, you could enable
+ Kerberos authentication between Impala internal components, or SSL
+ encryption between Impala internal components, but not both at the same
+ time. This restriction has now been lifted. See <xref
+ keyref="IMPALA-2598">IMPALA-2598</xref> to see the maintenance
+ releases for different levels of Impala where the fix has been
+ published.
</note>
<p id="hive_jdbc_ssl_kerberos_caveat">
@@ -4077,6 +4080,25 @@
</li>
</ul>
</p>
+ <p id="kudu_hints">
+ Starting from <keyword keyref="impala29_full"/>, the
+ <codeph>INSERT</codeph> or <codeph>UPSERT</codeph> operations into
+ Kudu tables automatically add an exchange and a sort node to the plan
+ that partitions and sorts the rows according to the partitioning/primary
+ key scheme of the target table (unless the number of rows to be inserted
+ is small enough to trigger single node execution). Since Kudu partitions
+ and sorts rows on write, pre-partitioning and sorting takes some of the
+ load off of Kudu and helps large <codeph>INSERT</codeph> operations to
+ complete without timing out. However, this default behavior may slow
+ down the end-to-end performance of the <codeph>INSERT</codeph> or
+ <codeph>UPSERT</codeph> operations. Starting from<keyword
+ keyref="impala210_full"/>, you can use the<codeph> /* +NOCLUSTERED
+ */</codeph> and <codeph>/* +NOSHUFFLE */</codeph> hints together to
+ disable partitioning and sorting before the rows are sent to Kudu.
+ Additionally, since sorting may consume a large amount of memory,
+ consider setting the <codeph>MEM_LIMIT</codeph> query option for those
+ queries.
+ </p>
</section>
diff --git a/docs/topics/impala_hints.xml b/docs/topics/impala_hints.xml
index 2bdcac1..d16b7f6 100644
--- a/docs/topics/impala_hints.xml
+++ b/docs/topics/impala_hints.xml
@@ -355,16 +355,8 @@
</ul>
</li>
</ul>
-
- <p>
- Starting from <keyword keyref="impala29_full"/>, <codeph>INSERT</codeph> or
- <codeph>UPSERT</codeph> operations into Kudu tables automatically have an exchange and
- sort node added to the plan that partitions and sorts the rows according to the
- partitioning/primary key scheme of the target table (unless the number of rows to be
- inserted is small enough to trigger single node execution). Use the<codeph> /*
- +NOCLUSTERED */</codeph> and <codeph>/* +NOSHUFFLE */</codeph> hints together to disable
- partitioning and sorting before the rows are sent to Kudu.
- </p>
+ <p><b>Kudu consideration:</b></p>
+ <p conref="../shared/impala_common.xml#common/kudu_hints"/>
<p rev="IMPALA-2924">
<b>Hints for scheduling of HDFS blocks:</b>
diff --git a/docs/topics/impala_kudu.xml b/docs/topics/impala_kudu.xml
index 145654e..c308c37 100644
--- a/docs/topics/impala_kudu.xml
+++ b/docs/topics/impala_kudu.xml
@@ -1260,6 +1260,8 @@
</p>
</note>
+ <p conref="../shared/impala_common.xml#common/kudu_hints"/>
+
</conbody>
</concept>