oak-doc/src/site/markdown/query/query-troubleshooting.md - jackrabbit-oak - Git at Google

 <!--
    Licensed to the Apache Software Foundation (ASF) under one or more
    contributor license agreements.  See the NOTICE file distributed with
    this work for additional information regarding copyright ownership.
    The ASF licenses this file to You under the Apache License, Version 2.0
    (the "License"); you may not use this file except in compliance with
    the License.  You may obtain a copy of the License at

        http://www.apache.org/licenses/LICENSE-2.0

    Unless required by applicable law or agreed to in writing, software
    distributed under the License is distributed on an "AS IS" BASIS,
    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    See the License for the specific language governing permissions and
    limitations under the License.
   -->

 ## Query Troubleshooting

 ### Slow Queries

 The first step in query troubleshooting is often to detect a query is slow,
 or traverses many nodes. Queries that traverse many nodes are logged
 as follows:

     *WARN* org.apache.jackrabbit.oak.plugins.index.Cursors$TraversingCursor
         Traversed 22000 nodes with filter Filter(query=
         select * from [nt:base] where isdescendantnode('/etc') and lower([jcr:title]) like '%coat%');
         consider creating an index or changing the query

 To get good performance, queries should not traverse more than about 1000 nodes
 (specially for queries that are run often).

 #### Potentially Slow Queries

 In addition to avoiding queries that traverse many nodes,
 it makes sense to avoid queries that don't use an index.
 Such queries might be fast (and only traverse few nodes) with a small repository,
 but with a large repository they are typically slow as well.
 Therefore, it makes sense to detect such queries as soon as possible
 (in a developer environment),
 even before the code that runs those queries is tested with a larger repository.
 Oak will detect such queries and log them as follows
 (with log level INFO for Oak 1.6.x, and WARN for Oak 1.8.x):

     *INFO* org.apache.jackrabbit.oak.query.QueryImpl Traversal query (query without index):
         select * from [nt:base] where isdescendantnode('/etc') and lower([jcr:title]) like '%coat%';
         consider creating an index

 #### Query Plan

 To understand why the query is slow, the first step is commonly to get the
 query execution plan. To do this, the query can be executed using `explain select ...`.
 For the above case, the plan is:

     [nt:base] as [nt:base] /* traverse "/etc//*"
     where (isdescendantnode([nt:base], [/etc])) and (lower([nt:base].[jcr:title]) like '%coat%') */

 That means, all nodes below `/etc` are traversed.

 #### Making the Query More Specific

 In order to make the query faster, try to add more constraints, or make constraints tighter.
 This will usually require some knowledge about the expected results.
 For example, if the path restriction is more specific, then less nodes need to be read.
 This is also true if an index is used. Also, if possible use a more specific node type.
 To understand if a nodetype or mixin is indexed, consult the nodetype index
 at `/oak:index/nodetype`, property `	declaringNodeTypes`.
 But even if this is not the case, the nodetype should be as specific as possible.
 Assuming the query is changed to this:

     select * from [acme:Product]
     where isdescendantnode('/etc/commerce')
     and lower([jcr:title]) like '%coat%')
     and [commerceType] = 'product'

 The only _relevant_ change was to improve the path restriction.
 But in this case, it already was enough to make the traversal warning go away.

 #### Queries Without Index

 After changing the query,
 there is still a message in the log file that complains the query doesn't use an index,
 as described above:

     *INFO* org.apache.jackrabbit.oak.query.QueryImpl
         Traversal query (query without index):
         select * from [acme:Product] where isdescendantnode('/etc/commerce')
         and lower([jcr:title]) like '%coat%'
         and [commerceType] = 'product'; consider creating an index

 The query plan of the index didn't change, so still nodes are traversed.
 In this case, there are relatively few nodes because it's
 an almost empty development repository, so no traversal warning is logged.
 But for production, there might be a lot more nodes under `/etc/commerce`,
 so it makes sense to continue optimization.

 #### Where Traversal is OK

 If it is known from the data model that a query will never traverse many nodes,
 then no index is needed. This is a corner case, and only applies to queries that
 traverse a fixed number of (for example) configuration nodes, or
 if the number of descendant nodes is guaranteed to be very low by using
 a certain nodetype that only allows for a fixed number of child nodes.
 If this is the case, then the query can be changed to say traversal is fine.
 To mark such queries, append `option(traversal ok)` to the query.
 This feature should only be used for those rare corner cases.

     select * from [nt:base]
     where isdescendantnode('/etc/commerce')
     and lower([jcr:title]) like '%coat%'
     and [commerceType] = 'product'
     option(traversal ok)

 #### Estimating Node Counts

 To find out how many nodes are in a certain path, you can use the JMX bean `NodeCounter`,
 which can estimate the node count. Example: run
 `getEstimatedChildNodeCounts` with `p1=/` and `p2=2` might give you:

     /: 2522208,
     ...
     /etc: 1521504,
     /etc/commerce: 29216,
     /etc/images: 1231232,
     ...

 So in this case, there are still many nodes below `/etc/commerce` in the production repository.
 Also note that the number of nodes can grow over time.

 #### Prevent Running Traversal Queries

 To avoid running queries that don't use an index altogether,
 you can change the configuration in the JMX bean `QueryEngineSettings`:
 if you set `FailTraversal` to `true`, then queries without index will throw an exception
 when trying to execute them, no matter how many nodes are in the repository.
 This doesn't mean queries will never traverse over nodes, it just means
 that queries that _must_ traverse over nodes will fail.

 #### Using a Different or New Index

 There are multiple options:

 * Consider creating an index for `jcr:title`. But for `like '%..%'` conditions,
   this is not of much help, because all nodes with that property will need to be read.
   Also, using `lower` will make the index less effective.
   So, this only makes sense if there are very few nodes with this property
   expected to be in the system.
 * If there are very few nodes with that nodetype,
   consider adding `acme:Product` to the nodetype index. This requires reindexing.
   The query could then use the nodetype index, and within this nodetype,
   just traverse below `/etc/commerce`.
   The `NodeCounter` can also help understand how many `acme:Product`
   nodes are in the repository, if this nodetype is indexed.
   To find out, run `getEstimatedChildNodeCounts` with
   `p1=/oak:index/nodetype` and `p2=2`.
 * If the query needs to return added nodes immediately (synchronously; that is without delay),
   consider creating a [property index](./property-index.html).
   Note that Lucene indexes are asynchronous, and new nodes may not
   appear in the result for a few seconds.
 * To ensure there is only one node matching the result in the repository,
   consider creating a unique [property index](./property-index.html).
 * Consider using a fulltext index, that is: change the query from using
   `lower([jcr:title]) like '%...%'` to using `contains([jcr:title], '...')`.
   Possibly combine this with adding the property
   `commerceType` to the fulltext index.

 The last plan is possibly the best solution for this case.

 #### Index Definition Generator

 In case you need to modify or create a Lucene property index,
 you can use the [Oak Tools](https://thomasmueller.github.io/oakTools/indexDefGenerator.html) tool.
 As the tool doesn't know your index configuration, it will always suggest
 to create a new index; it might be better to extend an existing index.
 However, note that:

 * Changing an existing index requires reindexing that index.
 * If an out-of-the-box index is modified, you will need to merge those modifications
   when migrating to newer software.
   It is best to add documentation to the index definition to simplify merging,
   for example in the form of "info" properties.

 #### Verification

 After changing the query, and possibly the index, run the `explain select` again,
 and verify the right plan is used, in this case that might be, for the query:

     select * from [acme:Product]
     where isdescendantnode('/etc/commerce')
     and contains([jcr:title], 'Coat')
     and [commerceType] = 'product'

     [nt:unstructured] as [acme:Product] /* lucene:lucene(/oak:index/lucene)
     full:jcr:title:coat ft:(jcr:title:"Coat")

 So in this case, only the fulltext restriction of the query was used by the index,
 but this might already be sufficient. If it is not, then the fulltext index might
 be changed to also index `commerceType`, or possibly
 to use `evaluatePathRestrictions`.

 #### Queries With Many OR or UNION Conditions

 Queries that contain many "or" conditions, or with many "union" subqueries,
 can be slow as they have to read a lot of data.
 Example query:

     /jcr:root/content/(a|b|c|d|e)//element(*, cq:Page)[
     jcr:contains(@jcr:title, 'some text')
     or jcr:contains(jcr:content/@keywords, 'some text')
     or jcr:contains(jcr:content/@cq:tags, 'some text')
     or jcr:contains(jcr:content/@team, 'some text')
     or jcr:contains(jcr:content/@topics, 'some text')
     or jcr:contains(jcr:content/@jcr:description, 'some text')]

 This query will be internally converted into 5 subqueries, due to the "union" clause (a|b|c|d|e).
 Then, each of the 5 subqueries will run 6 subqueries: one for each jcr:contains condition.
 So, the index will be contacted 30 times.

 To avoid this overhead, the index could be changed (or a new index created) to do aggregation
 on the required properties (here: jcr:title, jcr:content/keywords,...).
 This will simplify the query to:

     /jcr:root/content/(a|b|c|d|e)//element(*, cq:Page)[jcr:contains(., 'some text')]

 This should resolve most problems.
 To further speed up the query by avoiding to running 5 subqueries,
 it might be better to use a less specific path constraint,
 but instead use a different way to filter results, such as:

     /jcr:root/content//element(*, cq:Page)[jcr:contains(., 'some text') and @category='x']

 #### Ordering by Score Combined With OR / UNION Conditions

 Queries that expect results to be sorted by score ("order by @jcr:score descending"),
 and use "union" or "or" conditions, may not return the result in the expected order,
 depending on the index(es) used. Example:

     /jcr:root/conent/products/(indoor|outdoor)//*[jcr:contains(., 'test')]
     order by @jcr:score descending

 Here, the query is converted to a "union", and the result of both subqueries is combined.
 If the score for each subquery is not comparable (which is often the case for Lucene indexes),
 then the order of the results may not match the expected order.
 Instead of using path restrictions as above, it is most likely better to use a an additional
 condition in the query, and index that:

     /jcr:root/content/products//*[jcr:contains(., 'test') and
     (@productTag='indoor' or @productTag='outdoor')]
     order by @jcr:score descending

 If this is not possible, then try to avoid using "union", and use an "or" condition as follows.
 This will only work for SQL-2 queries however:

     select * from [nt:base] as a where contains(*, 'test') and issamenode(a, '/content') and
     ([jcr:path] like '/content/x800/%' or [jcr:path] like '/content/y900/%')
     order by [jcr:score] desc
	<!--
	Licensed to the Apache Software Foundation (ASF) under one or more
	contributor license agreements. See the NOTICE file distributed with
	this work for additional information regarding copyright ownership.
	The ASF licenses this file to You under the Apache License, Version 2.0
	(the "License"); you may not use this file except in compliance with
	the License. You may obtain a copy of the License at

	http://www.apache.org/licenses/LICENSE-2.0

	Unless required by applicable law or agreed to in writing, software
	distributed under the License is distributed on an "AS IS" BASIS,
	WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
	See the License for the specific language governing permissions and
	limitations under the License.
	-->

	## Query Troubleshooting

	### Slow Queries

	The first step in query troubleshooting is often to detect a query is slow,
	or traverses many nodes. Queries that traverse many nodes are logged
	as follows:

	WARN org.apache.jackrabbit.oak.plugins.index.Cursors$TraversingCursor
	Traversed 22000 nodes with filter Filter(query=
	select * from [nt:base] where isdescendantnode('/etc') and lower([jcr:title]) like '%coat%');
	consider creating an index or changing the query

	To get good performance, queries should not traverse more than about 1000 nodes
	(specially for queries that are run often).

	#### Potentially Slow Queries

	In addition to avoiding queries that traverse many nodes,
	it makes sense to avoid queries that don't use an index.
	Such queries might be fast (and only traverse few nodes) with a small repository,
	but with a large repository they are typically slow as well.
	Therefore, it makes sense to detect such queries as soon as possible
	(in a developer environment),
	even before the code that runs those queries is tested with a larger repository.
	Oak will detect such queries and log them as follows
	(with log level INFO for Oak 1.6.x, and WARN for Oak 1.8.x):

	INFO org.apache.jackrabbit.oak.query.QueryImpl Traversal query (query without index):
	select * from [nt:base] where isdescendantnode('/etc') and lower([jcr:title]) like '%coat%';
	consider creating an index

	#### Query Plan

	To understand why the query is slow, the first step is commonly to get the
	query execution plan. To do this, the query can be executed using `explain select ...`.
	For the above case, the plan is:

	[nt:base] as [nt:base] /* traverse "/etc//*"
	where (isdescendantnode([nt:base], [/etc])) and (lower([nt:base].[jcr:title]) like '%coat%') */

	That means, all nodes below `/etc` are traversed.

	#### Making the Query More Specific

	In order to make the query faster, try to add more constraints, or make constraints tighter.
	This will usually require some knowledge about the expected results.
	For example, if the path restriction is more specific, then less nodes need to be read.
	This is also true if an index is used. Also, if possible use a more specific node type.
	To understand if a nodetype or mixin is indexed, consult the nodetype index
	at `/oak:index/nodetype`, property ` declaringNodeTypes`.
	But even if this is not the case, the nodetype should be as specific as possible.
	Assuming the query is changed to this:

	select * from [acme:Product]
	where isdescendantnode('/etc/commerce')
	and lower([jcr:title]) like '%coat%')
	and [commerceType] = 'product'

	The only _relevant_ change was to improve the path restriction.
	But in this case, it already was enough to make the traversal warning go away.

	#### Queries Without Index

	After changing the query,
	there is still a message in the log file that complains the query doesn't use an index,
	as described above:

	INFO org.apache.jackrabbit.oak.query.QueryImpl
	Traversal query (query without index):
	select * from [acme:Product] where isdescendantnode('/etc/commerce')
	and lower([jcr:title]) like '%coat%'
	and [commerceType] = 'product'; consider creating an index

	The query plan of the index didn't change, so still nodes are traversed.
	In this case, there are relatively few nodes because it's
	an almost empty development repository, so no traversal warning is logged.
	But for production, there might be a lot more nodes under `/etc/commerce`,
	so it makes sense to continue optimization.

	#### Where Traversal is OK

	If it is known from the data model that a query will never traverse many nodes,
	then no index is needed. This is a corner case, and only applies to queries that
	traverse a fixed number of (for example) configuration nodes, or
	if the number of descendant nodes is guaranteed to be very low by using
	a certain nodetype that only allows for a fixed number of child nodes.
	If this is the case, then the query can be changed to say traversal is fine.
	To mark such queries, append `option(traversal ok)` to the query.
	This feature should only be used for those rare corner cases.

	select * from [nt:base]
	where isdescendantnode('/etc/commerce')
	and lower([jcr:title]) like '%coat%'
	and [commerceType] = 'product'
	option(traversal ok)

	#### Estimating Node Counts

	To find out how many nodes are in a certain path, you can use the JMX bean `NodeCounter`,
	which can estimate the node count. Example: run
	`getEstimatedChildNodeCounts` with `p1=/` and `p2=2` might give you:

	/: 2522208,
	...
	/etc: 1521504,
	/etc/commerce: 29216,
	/etc/images: 1231232,
	...

	So in this case, there are still many nodes below `/etc/commerce` in the production repository.
	Also note that the number of nodes can grow over time.

	#### Prevent Running Traversal Queries

	To avoid running queries that don't use an index altogether,
	you can change the configuration in the JMX bean `QueryEngineSettings`:
	if you set `FailTraversal` to `true`, then queries without index will throw an exception
	when trying to execute them, no matter how many nodes are in the repository.
	This doesn't mean queries will never traverse over nodes, it just means
	that queries that _must_ traverse over nodes will fail.

	#### Using a Different or New Index

	There are multiple options:

	* Consider creating an index for `jcr:title`. But for `like '%..%'` conditions,
	this is not of much help, because all nodes with that property will need to be read.
	Also, using `lower` will make the index less effective.
	So, this only makes sense if there are very few nodes with this property
	expected to be in the system.
	* If there are very few nodes with that nodetype,
	consider adding `acme:Product` to the nodetype index. This requires reindexing.
	The query could then use the nodetype index, and within this nodetype,
	just traverse below `/etc/commerce`.
	The `NodeCounter` can also help understand how many `acme:Product`
	nodes are in the repository, if this nodetype is indexed.
	To find out, run `getEstimatedChildNodeCounts` with
	`p1=/oak:index/nodetype` and `p2=2`.
	* If the query needs to return added nodes immediately (synchronously; that is without delay),
	consider creating a [property index](./property-index.html).
	Note that Lucene indexes are asynchronous, and new nodes may not
	appear in the result for a few seconds.
	* To ensure there is only one node matching the result in the repository,
	consider creating a unique [property index](./property-index.html).
	* Consider using a fulltext index, that is: change the query from using
	`lower([jcr:title]) like '%...%'` to using `contains([jcr:title], '...')`.
	Possibly combine this with adding the property
	`commerceType` to the fulltext index.

	The last plan is possibly the best solution for this case.

	#### Index Definition Generator

	In case you need to modify or create a Lucene property index,
	you can use the [Oak Tools](https://thomasmueller.github.io/oakTools/indexDefGenerator.html) tool.
	As the tool doesn't know your index configuration, it will always suggest
	to create a new index; it might be better to extend an existing index.
	However, note that:

	* Changing an existing index requires reindexing that index.
	* If an out-of-the-box index is modified, you will need to merge those modifications
	when migrating to newer software.
	It is best to add documentation to the index definition to simplify merging,
	for example in the form of "info" properties.

	#### Verification

	After changing the query, and possibly the index, run the `explain select` again,
	and verify the right plan is used, in this case that might be, for the query:

	select * from [acme:Product]
	where isdescendantnode('/etc/commerce')
	and contains([jcr:title], 'Coat')
	and [commerceType] = 'product'

	[nt:unstructured] as [acme:Product] /* lucene:lucene(/oak:index/lucene)
	full:jcr:title:coat ft:(jcr:title:"Coat")

	So in this case, only the fulltext restriction of the query was used by the index,
	but this might already be sufficient. If it is not, then the fulltext index might
	be changed to also index `commerceType`, or possibly
	to use `evaluatePathRestrictions`.

	#### Queries With Many OR or UNION Conditions

	Queries that contain many "or" conditions, or with many "union" subqueries,
	can be slow as they have to read a lot of data.
	Example query:

	/jcr:root/content/(a\|b\|c\|d\|e)//element(*, cq:Page)[
	jcr:contains(@jcr:title, 'some text')
	or jcr:contains(jcr:content/@keywords, 'some text')
	or jcr:contains(jcr:content/@cq:tags, 'some text')
	or jcr:contains(jcr:content/@team, 'some text')
	or jcr:contains(jcr:content/@topics, 'some text')
	or jcr:contains(jcr:content/@jcr:description, 'some text')]

	This query will be internally converted into 5 subqueries, due to the "union" clause (a\|b\|c\|d\|e).
	Then, each of the 5 subqueries will run 6 subqueries: one for each jcr:contains condition.
	So, the index will be contacted 30 times.

	To avoid this overhead, the index could be changed (or a new index created) to do aggregation
	on the required properties (here: jcr:title, jcr:content/keywords,...).
	This will simplify the query to:

	/jcr:root/content/(a\|b\|c\|d\|e)//element(*, cq:Page)[jcr:contains(., 'some text')]

	This should resolve most problems.
	To further speed up the query by avoiding to running 5 subqueries,
	it might be better to use a less specific path constraint,
	but instead use a different way to filter results, such as:

	/jcr:root/content//element(*, cq:Page)[jcr:contains(., 'some text') and @category='x']

	#### Ordering by Score Combined With OR / UNION Conditions

	Queries that expect results to be sorted by score ("order by @jcr:score descending"),
	and use "union" or "or" conditions, may not return the result in the expected order,
	depending on the index(es) used. Example:

	/jcr:root/conent/products/(indoor\|outdoor)//*[jcr:contains(., 'test')]
	order by @jcr:score descending

	Here, the query is converted to a "union", and the result of both subqueries is combined.
	If the score for each subquery is not comparable (which is often the case for Lucene indexes),
	then the order of the results may not match the expected order.
	Instead of using path restrictions as above, it is most likely better to use a an additional
	condition in the query, and index that:

	/jcr:root/content/products//*[jcr:contains(., 'test') and
	(@productTag='indoor' or @productTag='outdoor')]
	order by @jcr:score descending

	If this is not possible, then try to avoid using "union", and use an "or" condition as follows.
	This will only work for SQL-2 queries however:

	select * from [nt:base] as a where contains(*, 'test') and issamenode(a, '/content') and
	([jcr:path] like '/content/x800/%' or [jcr:path] like '/content/y900/%')
	order by [jcr:score] desc