oak-doc/src/site/markdown/query/grammar-xpath.md.vm - jackrabbit-oak - Git at Google

 <!--
    Licensed to the Apache Software Foundation (ASF) under one or more
    contributor license agreements.  See the NOTICE file distributed with
    this work for additional information regarding copyright ownership.
    The ASF licenses this file to You under the Apache License, Version 2.0
    (the "License"); you may not use this file except in compliance with
    the License.  You may obtain a copy of the License at

        http://www.apache.org/licenses/LICENSE-2.0

    Unless required by applicable law or agreed to in writing, software
    distributed under the License is distributed on an "AS IS" BASIS,
    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    See the License for the specific language governing permissions and
    limitations under the License.
   -->
 #set($h1 = '#')
 #set($h2 = '##')
 #set($h3 = '###')
 #set($h4 = '####')

 #set($xpath-grammar = ${project.basedir} + '/src/site/resources/grammar/xpath.csv')

 #macro( railroad_link $topic )
 <!-- MACRO{railroad|file=$xpath-grammar|topic=$topic|renderLink=true} -->
 #end

 #macro( render $topic )
 <!-- MACRO{railroad|file=$xpath-grammar|topic=$topic} -->
 #end

 $h2 Oak XPath Query Grammar - ${project.name}

 * #railroad_link( 'Query' )
 * #railroad_link( 'Filter' )
 * #railroad_link( 'Column' )
 * #railroad_link( 'Constraint' )
 * #railroad_link( 'And Condition' )
 * #railroad_link( 'Condition' )
 * #railroad_link( 'Comparison' )
 * #railroad_link( 'Static Operand' )
 * #railroad_link( 'Ordering' )
 * #railroad_link( 'Dynamic Operand' )
 * #railroad_link( 'Options' )
 * #railroad_link( 'Explain' )
 * #railroad_link( 'Measure' )

 ---

 #render( 'Query' )

 The "/jcr:root" means the root node.
 It is recommended that all XPath queries start with this term.

 All queries should have a path restriction
 (even if it's just, for example, "/content"), as this allows to shrink indexes.

 "order by" may use an index.
 If there is no index for the given sort order,
 then the result is fully read in memory and sorted before returning the first row.

 The column list is usually not needed, as all properties of the nodes are returned in any case.
 It is only needed if non-standard (computed) columns such as
 "rep:excerpt" or "rep:spellcheck" are needed.

 Examples:

 Get all nodes of node type 'sling:Folder' with the property 'sling:resourceType' set to 'x':

     /jcr:root/content//element(*, sling:Folder)[@sling:resourceType='x']

 Get all index definition nodes of type 'lucene', sorted by reindex count, descending:

     /jcr:root/oak:index/element(*, oak:QueryIndexDefinition)[@type='lucene'] order by @reindexCount descending

 Get all nodes below /etc that have the property type set to 'report'; fail the query if there is no index that can be used:

     /jcr:root/etc//*[@type='report'] option(traversal fail)


 ---

 #render( 'Filter' )

 A single slash means filtering on a specific child node,
 while two slashes means filtering on a descendant node.

 "*" means any node name, and any node type.

 "text()" is a shortcut for "jcr:xmltext".
 It is supported only for compatibility.

 Queries using the construct `(filter1 | filter2)' are converted to union,
 that is, one query is generated for each filter, and the result of both
 queries is combined.
 Note that for fulltext queries, it is problematic to use union,
 because scoring is done for each subquery individually.
 The score is not useful to compare results of different subqueries,
 so that the union of multiple fulltext queries won't be ordered by score
 as one might expect.

 Examples:

 Only direct child nodes of /content/dam:

     /jcr:root/content/dam/*

 All descendant nodes of /content/oak:index:

     /jcr:root/content/oak:index//*

 All nodes named "rep:policy" that have a parent node which is the direct child node of /home/users:

     /jcr:root/home/users/*/rep:policy

 All nodes named "profile" that have a parent node which is a descendant of /home/users:

     /jcr:root/home/users//*/profile

 All descendant nodes of /content that are of type oak:QueryIndexDefinition:

     /jcr:root/content//element(*, oak:QueryIndexDefinition)

 All descendant nodes of /content that are of type oak:QueryIndexDefinition and are named "lucene":

     /jcr:root/content//element(lucene, oak:QueryIndexDefinition)

 All nodes named "indexRules" that have a parent node with the property "type" set to "lucene":

     /jcr:root/oak:index/*[@type = 'lucene']/indexRules

 The paths '/libs' and '/etc' should be searched:

     /jcr:root/(libs | etc)//*[@jcr:uuid and @jcr:mimeType = 'text/css']


 ---

 #render( 'Column' )

 "rep:excerpt": include the excerpt in the result.
 Since Oak version 1.8.1, optionally a property name can be specified.
 See <a href="query-engine.html#Excerpts_and_Highlighting">Excerpts and Highlighting</a>.

 "rep:spellcheck": Include the spellcheck in the result.
 See <a href="query-engine.html#Spellchecking">Spellchecking</a>.

 "rep:suggest": include suggestions in the result.
 See <a href="query-engine.html#Suggestions">Suggestions</a>.

 "rep:facet": include facets in the result.
 See <a href="query-engine.html#Facets">Facets</a>.

 Examples:

     /jcr:root/content//*[jcr:contains(., 'test')]/(rep:excerpt())

     /jcr:root/content//*[jcr:contains(., 'test')]/(rep:excerpt(@jcr:title) | rep:excerpt())

     /jcr:root/content//*[rep:suggest('in ')]/(rep:suggest())

     /jcr:root/content//*[jcr:contains(@jcr:title, 'oak')]/(rep:facet(@tags))


 ---

 #render( 'Constraint' )

 "or" conditions of the form "@x = 1 or @x = 2" can use the same index.

 "or" conditions of the form "@x = 1 or @y = 2" are more complicated.
 Oak will convert them to a "union" query
 (one query with @x = 1, and a second query with @y = 2).

 Examples:

 Include the nodes that have the property 'hidden' set to 'hidden-folder',
 or don't have the property set:

     /jcr:root/content/dam/*[@hidden='hidden-folder' or not(@hidden)]


 ---

 #render( 'And Condition' )

 A special case (not found in relational databases) is
 "and" conditions of the form "@x = 1 and @x = 2".
 They will match nodes with multi-valued properties,
 where the property value contains both 1 and 2.

 Examples:

     /jcr:root/home//element(*, rep:Authorizable)[@rep:principalName != 'Joe' and @rep:principalName != 'Steve']


 ---

 #render( 'Condition' )

 "fn:not": this is used for both "is not null", and for nested conditions.
 For example "fn:not(@status)" means nodes where this property is not set.
 In this case, and index can be used, but it is relatively expensive to index
 all nodes that don't have a certain property. It is recommended to only index
 this if there are relatively view nodes with this nodetype. See also
 <a href="lucene.html">nullCheckEnabled</a>.
 Nested conditions, for example "fn:not(@status = 'x' and @color = 'red')",
 can not typically use an index.

 "jcr:contains": see <a href="query-engine.html#Full-Text_Queries">Full-Text Queries</a>.

 "jcr:like": the wildcards characters are _ (any one character)
 and % (any characters). An index is used,
 except if the operand starts with a wildcard.
 To search for the characters % and _, the characters need to be escaped using \ (backslash).

 "rep:similar": see <a href="query-engine.html#Similarity_Queries">Similarity Queries</a>.

 "rep:native": see <a href="query-engine.html#Native_Queries">Native Queries</a>.

 "rep:spellcheck": see <a href="query-engine.html#Spellchecking">Spellchecking</a>.

 "rep:suggest": see <a href="query-engine.html#Suggestions">Suggestions</a>.

 Examples:

     /jcr:root/var/eventing//element(*, slingevent:Job)[@event.job.topic = 'abc' and not(@slingevent:finishedState)]

     /jcr:root/content//element(*, cq:Page)[@offTime > xs:dateTime('2020-12-01T20:00:00.000') or @onTime > xs:dateTime('2020-12-01T20:00:00.000')]


 ---

 #render( 'Comparison' )

 Comparison using &lt;, &gt;, &gt;=, and &lt;= can use an index if the property in the index is ordered.

 Examples:

     @jcr:primaryType != 'nt:base'
     @offTime > xs:dateTime('2020-12-01T20:00:00.000')


 ---

 #render( 'Static Operand' )

 A string (text) literal starts and ends with a single quote.
 Two single quotes can be used to create a single quote inside a string.

 Examples:

     'John''s car'
     true()
     false
     1000
     -30.3
     xs:dateTime('2020-12-01T20:00:00.000')

 ---

 #render( 'Ordering' )

 Ordering by an indexed property will use that index if possible.
 If there is no index that can be used for the given sort order,
 then the result is fully read in memory and sorted there.

 As a special case, sorting by "jcr:score" in descending order is ignored
 (removed from the list), as this is what the fulltext index does anyway
 (and if no fulltext index is used, then the score doesn't apply).
 If for some reason you want to enforce sorting by "jcr:score", then
 you can use the workaround to order by "fn:lowercase(@jcr:score) descending".
 Note that for fulltext queries, it is problematic to use union,
 because scoring is done for each subquery individually.
 The score is not useful to compare results of different subqueries,
 so that the union of multiple fulltext queries won't be ordered by score
 as one might expect.

 Examples:

     order by @jcr:created descending


 ---

 #render( 'Dynamic Operand' )

 The "*" stands for any property.
 Using it in a condition requires a relative path.
 For example: `[./* = 'test']` means where any property matches the word 'test'.
 Relative path fragments can also contain `*` to represent 'any' node at
 that point. `//` is *not* supported as part of relative path. So, `a/*/@test`,
 `*/a/@test`, `a/*/*/@test` etc are valid while `a//@test`, `a/*/b//@test`, etc are
 *not*.


 "jcr:score()" is the score returned by the index.

 "fn:coalesce": this returns the first operand if it is not null,
 and the second operand otherwise.
 `@since Oak 1.8`

 Examples:

     @type
     ./@jcr:primaryType
     items/type/@metaType
     indexRules/nt:base/*
     fn:string-length(@title)
     fn:name()
     jcr:score ()
     fn:lower-case(@lastName)
     fn:coalesce(@lastName, @name)


 ---

 #render( 'Options' )

 "traversal": by default, queries without index will log a warning,
 except if the configuration option `QueryEngineSettings.failTraversal` is changed
 The traversal option can be used to change the behavior of the given query:
 "ok" to not log a warning,
 "warn" to log a warning,
 "fail" to fail the query, and
 "default" to use the default setting.

 "index tag": by default, queries will use the index with the lowest expected cost (as in relational databases).
 To only consider some of the indexes, add tags (a multi-valued String property) to the index(es) of choice,
 and specify this tag in the query.
 See <a href="query-engine.html#Query_Option_Index_Tag">Query Option Index Tag</a>.

 Examples:

     option(traversal fail)


 ---

 #render( 'Explain' )

 Does not run the query, but only computes and returns the query plan.
 With "explain measure", the expected cost is calculated as well.
 In both cases, the query result will only have one column called 'plan', and one row that contains the plan.

 Examples:

     exlplain measure /jcr:root//*[@jcr:uuid = 'x']

 Result:

     plan = [nt:base] as [nt:base]
     /* property uuid = 1 where [nt:base].[jcr:uuid] = 1 */
     cost: { "nt:base": 2.0 }

 This means the property index named "uuid" is used for this query.
 The expected cost (roughly the number of uncached I/O operations) is 2.

 ---

 #render( 'Measure' )

 Runs the query, but instead of returning the result, returns the number of rows traversed.
 The query result has two columns, one called 'selector' and one called 'scanCount'.
 The result has at least two rows, one that represents the total (selector set to 'query'),
 and one per selector used in the query.

 Examples:

     measure /jcr:root//*[@jcr:uuid = 'x']

 Result:

     selector = query
     scanCount = 0
     selector = nt:base
     scanCount = 0

 In this case, the scanCount is zero because the query did not find any nodes.
	<!--
	Licensed to the Apache Software Foundation (ASF) under one or more
	contributor license agreements. See the NOTICE file distributed with
	this work for additional information regarding copyright ownership.
	The ASF licenses this file to You under the Apache License, Version 2.0
	(the "License"); you may not use this file except in compliance with
	the License. You may obtain a copy of the License at

	http://www.apache.org/licenses/LICENSE-2.0

	Unless required by applicable law or agreed to in writing, software
	distributed under the License is distributed on an "AS IS" BASIS,
	WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
	See the License for the specific language governing permissions and
	limitations under the License.
	-->
	#set($h1 = '#')
	#set($h2 = '##')
	#set($h3 = '###')
	#set($h4 = '####')

	#set($xpath-grammar = ${project.basedir} + '/src/site/resources/grammar/xpath.csv')

	#macro( railroad_link $topic )
	<!-- MACRO{railroad\|file=$xpath-grammar\|topic=$topic\|renderLink=true} -->
	#end

	#macro( render $topic )
	<!-- MACRO{railroad\|file=$xpath-grammar\|topic=$topic} -->
	#end

	$h2 Oak XPath Query Grammar - ${project.name}

	* #railroad_link( 'Query' )
	* #railroad_link( 'Filter' )
	* #railroad_link( 'Column' )
	* #railroad_link( 'Constraint' )
	* #railroad_link( 'And Condition' )
	* #railroad_link( 'Condition' )
	* #railroad_link( 'Comparison' )
	* #railroad_link( 'Static Operand' )
	* #railroad_link( 'Ordering' )
	* #railroad_link( 'Dynamic Operand' )
	* #railroad_link( 'Options' )
	* #railroad_link( 'Explain' )
	* #railroad_link( 'Measure' )

	---

	#render( 'Query' )

	The "/jcr:root" means the root node.
	It is recommended that all XPath queries start with this term.

	All queries should have a path restriction
	(even if it's just, for example, "/content"), as this allows to shrink indexes.

	"order by" may use an index.
	If there is no index for the given sort order,
	then the result is fully read in memory and sorted before returning the first row.

	The column list is usually not needed, as all properties of the nodes are returned in any case.
	It is only needed if non-standard (computed) columns such as
	"rep:excerpt" or "rep:spellcheck" are needed.

	Examples:

	Get all nodes of node type 'sling:Folder' with the property 'sling:resourceType' set to 'x':

	/jcr:root/content//element(*, sling:Folder)[@sling:resourceType='x']

	Get all index definition nodes of type 'lucene', sorted by reindex count, descending:

	/jcr:root/oak:index/element(*, oak:QueryIndexDefinition)[@type='lucene'] order by @reindexCount descending

	Get all nodes below /etc that have the property type set to 'report'; fail the query if there is no index that can be used:

	/jcr:root/etc//*[@type='report'] option(traversal fail)


	---

	#render( 'Filter' )

	A single slash means filtering on a specific child node,
	while two slashes means filtering on a descendant node.

	"*" means any node name, and any node type.

	"text()" is a shortcut for "jcr:xmltext".
	It is supported only for compatibility.

	Queries using the construct `(filter1 \| filter2)' are converted to union,
	that is, one query is generated for each filter, and the result of both
	queries is combined.
	Note that for fulltext queries, it is problematic to use union,
	because scoring is done for each subquery individually.
	The score is not useful to compare results of different subqueries,
	so that the union of multiple fulltext queries won't be ordered by score
	as one might expect.

	Examples:

	Only direct child nodes of /content/dam:

	/jcr:root/content/dam/*

	All descendant nodes of /content/oak:index:

	/jcr:root/content/oak:index//*

	All nodes named "rep:policy" that have a parent node which is the direct child node of /home/users:

	/jcr:root/home/users/*/rep:policy

	All nodes named "profile" that have a parent node which is a descendant of /home/users:

	/jcr:root/home/users//*/profile

	All descendant nodes of /content that are of type oak:QueryIndexDefinition:

	/jcr:root/content//element(*, oak:QueryIndexDefinition)

	All descendant nodes of /content that are of type oak:QueryIndexDefinition and are named "lucene":

	/jcr:root/content//element(lucene, oak:QueryIndexDefinition)

	All nodes named "indexRules" that have a parent node with the property "type" set to "lucene":

	/jcr:root/oak:index/*[@type = 'lucene']/indexRules

	The paths '/libs' and '/etc' should be searched:

	/jcr:root/(libs \| etc)//*[@jcr:uuid and @jcr:mimeType = 'text/css']


	---

	#render( 'Column' )

	"rep:excerpt": include the excerpt in the result.
	Since Oak version 1.8.1, optionally a property name can be specified.
	See <a href="query-engine.html#Excerpts_and_Highlighting">Excerpts and Highlighting</a>.

	"rep:spellcheck": Include the spellcheck in the result.
	See <a href="query-engine.html#Spellchecking">Spellchecking</a>.

	"rep:suggest": include suggestions in the result.
	See <a href="query-engine.html#Suggestions">Suggestions</a>.

	"rep:facet": include facets in the result.
	See <a href="query-engine.html#Facets">Facets</a>.

	Examples:

	/jcr:root/content//*[jcr:contains(., 'test')]/(rep:excerpt())

	/jcr:root/content//*[jcr:contains(., 'test')]/(rep:excerpt(@jcr:title) \| rep:excerpt())

	/jcr:root/content//*[rep:suggest('in ')]/(rep:suggest())

	/jcr:root/content//*[jcr:contains(@jcr:title, 'oak')]/(rep:facet(@tags))


	---

	#render( 'Constraint' )

	"or" conditions of the form "@x = 1 or @x = 2" can use the same index.

	"or" conditions of the form "@x = 1 or @y = 2" are more complicated.
	Oak will convert them to a "union" query
	(one query with @x = 1, and a second query with @y = 2).

	Examples:

	Include the nodes that have the property 'hidden' set to 'hidden-folder',
	or don't have the property set:

	/jcr:root/content/dam/*[@hidden='hidden-folder' or not(@hidden)]


	---

	#render( 'And Condition' )

	A special case (not found in relational databases) is
	"and" conditions of the form "@x = 1 and @x = 2".
	They will match nodes with multi-valued properties,
	where the property value contains both 1 and 2.

	Examples:

	/jcr:root/home//element(*, rep:Authorizable)[@rep:principalName != 'Joe' and @rep:principalName != 'Steve']


	---

	#render( 'Condition' )

	"fn:not": this is used for both "is not null", and for nested conditions.
	For example "fn:not(@status)" means nodes where this property is not set.
	In this case, and index can be used, but it is relatively expensive to index
	all nodes that don't have a certain property. It is recommended to only index
	this if there are relatively view nodes with this nodetype. See also
	<a href="lucene.html">nullCheckEnabled</a>.
	Nested conditions, for example "fn:not(@status = 'x' and @color = 'red')",
	can not typically use an index.

	"jcr:contains": see <a href="query-engine.html#Full-Text_Queries">Full-Text Queries</a>.

	"jcr:like": the wildcards characters are _ (any one character)
	and % (any characters). An index is used,
	except if the operand starts with a wildcard.
	To search for the characters % and _, the characters need to be escaped using \ (backslash).

	"rep:similar": see <a href="query-engine.html#Similarity_Queries">Similarity Queries</a>.

	"rep:native": see <a href="query-engine.html#Native_Queries">Native Queries</a>.

	"rep:spellcheck": see <a href="query-engine.html#Spellchecking">Spellchecking</a>.

	"rep:suggest": see <a href="query-engine.html#Suggestions">Suggestions</a>.

	Examples:

	/jcr:root/var/eventing//element(*, slingevent:Job)[@event.job.topic = 'abc' and not(@slingevent:finishedState)]

	/jcr:root/content//element(*, cq:Page)[@offTime > xs:dateTime('2020-12-01T20:00:00.000') or @onTime > xs:dateTime('2020-12-01T20:00:00.000')]


	---

	#render( 'Comparison' )

	Comparison using <, >, >=, and <= can use an index if the property in the index is ordered.

	Examples:

	@jcr:primaryType != 'nt:base'
	@offTime > xs:dateTime('2020-12-01T20:00:00.000')


	---

	#render( 'Static Operand' )

	A string (text) literal starts and ends with a single quote.
	Two single quotes can be used to create a single quote inside a string.

	Examples:

	'John''s car'
	true()
	false
	1000
	-30.3
	xs:dateTime('2020-12-01T20:00:00.000')

	---

	#render( 'Ordering' )

	Ordering by an indexed property will use that index if possible.
	If there is no index that can be used for the given sort order,
	then the result is fully read in memory and sorted there.

	As a special case, sorting by "jcr:score" in descending order is ignored
	(removed from the list), as this is what the fulltext index does anyway
	(and if no fulltext index is used, then the score doesn't apply).
	If for some reason you want to enforce sorting by "jcr:score", then
	you can use the workaround to order by "fn:lowercase(@jcr:score) descending".
	Note that for fulltext queries, it is problematic to use union,
	because scoring is done for each subquery individually.
	The score is not useful to compare results of different subqueries,
	so that the union of multiple fulltext queries won't be ordered by score
	as one might expect.

	Examples:

	order by @jcr:created descending


	---

	#render( 'Dynamic Operand' )

	The "*" stands for any property.
	Using it in a condition requires a relative path.
	For example: `[./* = 'test']` means where any property matches the word 'test'.
	Relative path fragments can also contain `*` to represent 'any' node at
	that point. `//` is not supported as part of relative path. So, `a/*/@test`,
	`/a/@test`, `a///@test` etc are valid while `a//@test`, `a//b//@test`, etc are
	not.


	"jcr:score()" is the score returned by the index.

	"fn:coalesce": this returns the first operand if it is not null,
	and the second operand otherwise.
	`@since Oak 1.8`

	Examples:

	@type
	./@jcr:primaryType
	items/type/@metaType
	indexRules/nt:base/*
	fn:string-length(@title)
	fn:name()
	jcr:score ()
	fn:lower-case(@lastName)
	fn:coalesce(@lastName, @name)


	---

	#render( 'Options' )

	"traversal": by default, queries without index will log a warning,
	except if the configuration option `QueryEngineSettings.failTraversal` is changed
	The traversal option can be used to change the behavior of the given query:
	"ok" to not log a warning,
	"warn" to log a warning,
	"fail" to fail the query, and
	"default" to use the default setting.

	"index tag": by default, queries will use the index with the lowest expected cost (as in relational databases).
	To only consider some of the indexes, add tags (a multi-valued String property) to the index(es) of choice,
	and specify this tag in the query.
	See <a href="query-engine.html#Query_Option_Index_Tag">Query Option Index Tag</a>.

	Examples:

	option(traversal fail)


	---

	#render( 'Explain' )

	Does not run the query, but only computes and returns the query plan.
	With "explain measure", the expected cost is calculated as well.
	In both cases, the query result will only have one column called 'plan', and one row that contains the plan.

	Examples:

	exlplain measure /jcr:root//*[@jcr:uuid = 'x']

	Result:

	plan = [nt:base] as [nt:base]
	/* property uuid = 1 where [nt:base].[jcr:uuid] = 1 */
	cost: { "nt:base": 2.0 }

	This means the property index named "uuid" is used for this query.
	The expected cost (roughly the number of uncached I/O operations) is 2.

	---

	#render( 'Measure' )

	Runs the query, but instead of returning the result, returns the number of rows traversed.
	The query result has two columns, one called 'selector' and one called 'scanCount'.
	The result has at least two rows, one that represents the total (selector set to 'query'),
	and one per selector used in the query.

	Examples:

	measure /jcr:root//*[@jcr:uuid = 'x']

	Result:

	selector = query
	scanCount = 0
	selector = nt:base
	scanCount = 0

	In this case, the scanCount is zero because the query did not find any nodes.