blob: ce7fafa0891a964eb4d0d0f6075ffd2b9c25dd56 [file] [log] [blame]
~~ Licensed to the Apache Software Foundation (ASF) under one or more
~~ contributor license agreements. See the NOTICE file distributed with
~~ this work for additional information regarding copyright ownership.
~~ The ASF licenses this file to You under the Apache License, Version 2.0
~~ (the "License"); you may not use this file except in compliance with
~~ the License. You may obtain a copy of the License at
~~
~~ http://www.apache.org/licenses/LICENSE-2.0
~~
~~ Unless required by applicable law or agreed to in writing, software
~~ distributed under the License is distributed on an "AS IS" BASIS,
~~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
~~ See the License for the specific language governing permissions and
~~ limitations under the License.
How to use Indexing Features in VXQuery.
In VXQuery, all the indexes are created in user specified directory. In order to use indexing,
you will need to set this directory in your cluster configuration file.
* Configuring VXQuery to use indexing functions.
Add the following line to your cluster configuration (e.g. cluster.xml)
--------
<index_directory><path_to_index_directory></index_directory>
--------
(You should create this index_directory)
* Using indexing queries.
VXQuery offers following indexing functionality.
[[a]] Create an index for collection.
[[b]] Use the index in executing a query.
[[c]] Update the index.
[[d]] Delete the index.
[[e]] View existing indexes.
*Scenario I - When collection is a single directory.
In this scenario, all the XML files are stored in a single directory. (There can be sub directories)
** Creating an index for collection
If I need to create index for xml collection stored in <path_to_collection_1>,
Query structure:
--------
build-index-on-collection("<path_to_collection_1>")
--------
You can see the index has created in a new sub-directory in the index_directory specified in local.xml
Example:
--------
build-index-on-collection("<path_to_collection_1>")
--------
This function takes the collection path as an argument.
** Using index in query.
If we need to use the index and execute a query, use the following structure.
------
for $r in collection-from-index("<path1>/collection1", "/dataCollection/data")/data
where $r/dataType eq "AWND" and xs:decimal($r/value) gt 491.744
return $r
------
Here the index access function is,
------
collection-from-index
------
which takes two arguments, collection folder and the path element.
Result:
------
<data>
<date>2001-01-01T00:00:00.000</date>
<dataType>AWND</dataType>
<station>GHCND:US000000001</station>
<value>1000</value>
<attributes>
<attribute/>
<attribute/>
<attribute>a</attribute>
</attributes>
</data>
------
** Updating the index.
A collection can be modified or changed by following ways.
[[1]] Inserting new XML files.
[[2]] Deleting files.
[[3]] Add/ remove or modify the content of XML files.
In this type of situation, the index corresponding to the modified collection must also be modified.
To achieve this the update-index function can be used.
Query structure:
--------
update-index("<path_to_collection_1>")
--------
Example:
-------
update-index("<path_to_collection_1>")
-------
This function takes the collection which was modified.
** Deleting the index.
If we want to delete the entire index created for a collection, the delete-index function can be used.
This function also takes the collection path of which the index is needed to be deleted.
Query structure:
--------
delete-index("<path_to_collection_1>")
--------
Example:
-------
delete-index("<path_to_collection_1>")
-------
*Scenario II - When the collection is distributed.
In this scenario, the collection is distributed among several directories.
We can distribute the queries among partitions.
** Creating indexes for collections.
Query structure:
--------
build-index-on-collection("<path_to_collection_1>|<path_to_collection_2>|...|<path_to_collection_n>")
--------
In here the parameter contains the list of collection partitions separated by '|' character.
Example:
Consider the collection has now distributed among four directories, path_to_collection_1, path_to_collection_2,
path_to_collection_3 and path_to_collection_4.
To create indexes for all of the above collections,
-------
build-index-on-collection("path_to_collection_1|path_to_collection_2|path_to_collection_3|path_to_collection_4")
-------
In this case, all indexes will be created in separate sub-directories corresponding to each partition.
Also note that this query requires each node to have four partitions available.
** Using the indexes in query.
In this case, suppose you need to run a query on indexes of two collection partitions.
Example:
-----
for $r in collection-from-index("<path_to_collection_1>|<path_to_collection_2>", "/dataCollection/data")/data
where $r/dataType eq "AWND" and xs:decimal($r/value) gt 491.744
return $r
-----
The result will be taken from the indexes of both path_to_collection_1 and path_to_collection_2.
Result:
------
<data>
<date>2001-01-01T00:00:00.000</date>
<dataType>AWND</dataType>
<station>GHCND:US000000001</station>
<value>1000</value>
<attributes>
<attribute/>
<attribute/>
<attribute>a</attribute>
</attributes>
</data>
------
** Updating the indexes.
In cases of updating the collection files stored in several partitions, we can use this function to update the
indexes of those directories.
In this case, give a '|' separated list of collection directories.
Query structure:
--------
update-index("<path_to_collection_1>|<path_to_collection_2>|...|<path_to_collection_n>")
--------
Example:
Suppose that we need to update the indexes in partition1 and partition4
--------
update-index("<path_to_collection_1>|<path_to_collection_4>")
--------
** Deleting the indexes.
If we want to delete indexes of collections in several partitions, we can use this function.
Query structure:
--------
delete-index("<path_to_collection_1>|<path_to_collection_2>|...|<path_to_collection_n>")
--------
Example:
Suppose that we need to update the indexes in collection2 and collection3
--------
delete-index("<path_to_collection_2>|<path_to_collection_3>")
--------
** Viewing Index information.
Suppose you need to check, what are the collections have indexes created.
To do this, the show-index function can be used.
This function takes no arguments and returns a sequence of collection paths, which an index is already created.
If there are no indexes created for any collection, the result will be null.
Suppose we have two collections, <path_to_collection_1>, <path_to_collection_2> have indexes created.
Example:
------
show-index()
------
Result:
------
<path_to_collection_1>
<path_to_collection_2>
------