blob: bbcdeb50034ebcf08fd5fbdb131a56efe11d2d30 [file] [log] [blame]
// Licensed to the Apache Software Foundation (ASF) under one or more
// contributor license agreements. See the NOTICE file distributed with
// this work for additional information regarding copyright ownership.
// The ASF licenses this file to you under the Apache License, Version
// 2.0 (the "License"); you may not use this file except in compliance
// with the License. You may obtain a copy of the License at
//
// https://www.apache.org/licenses/LICENSE-2.0 Unless required by
// applicable law or agreed to in writing, software distributed under the
// License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR
// CONDITIONS OF ANY KIND, either express or implied. See the License for
// the specific language governing permissions and limitations under the
// License.
=== Performance Tuning
==== Prefetching
Prefetching is a technique that allows to bring back in one query not only the queried objects, but also objects related to them.
In other words it is a controlled eager relationship resolving mechanism. Prefetching is discussed in the "Performance Tuning" chapter,
as it is a powerful performance optimization method. However another common application of prefetching is to refresh stale
object relationships, so more generally it can be viewed as a technique for managing subsets of the object graph.
Prefetching example:
[source, Java]
----
ObjectSelect<Artist> query = ObjectSelect.query(Artist.class);
// instructs Cayenne to prefetch one of Artist's relationships
query.prefetch(Artist.PAINTINGS.disjoint());
// the above line is equivalent to the following:
// query.prefetch("paintings", PrefetchTreeNode.DISJOINT_PREFETCH_SEMANTICS);
// query is expecuted as usual, but the resulting Artists will have
// their paintings "inflated"
List<Artist> artists = query.select(context);
----
All types of relationships can be preftetched - to-one, to-many, flattened. A prefetch can span multiple relationships:
[source, Java]
----
query.prefetch(Artist.PAINTINGS.dot(Painting.GALLERY).disjoint());
----
A query can have multiple prefetches:
[source, Java]
----
query.prefetch(Artist.PAINTINGS.disjoint());
query.prefetch(Artist.PAINTINGS.dot(Painting.GALLERY).disjoint());
----
If a query is fetching DataRows, all "disjoint" prefetches are ignored, only "joint" prefetches are executed
(see prefetching semantics discussion below for what disjoint and joint prefetches mean).
===== Prefetching Semantics
Prefetching semantics defines a strategy to prefetch relationships. Depending on it, Cayenne would generate different types of queries.
The end result is the same - query root objects with related objects fully resolved. However semantics can affect performance,
in some cases significantly. There are 3 types of prefetch semantics, all defined as constants in `org.apache.cayenne.query.PrefetchTreeNode`:
[source]
----
PrefetchTreeNode.JOINT_PREFETCH_SEMANTICS
PrefetchTreeNode.DISJOINT_PREFETCH_SEMANTICS
PrefetchTreeNode.DISJOINT_BY_ID_PREFETCH_SEMANTICS
----
There's no limitation on mixing different types of semantics in the same query. Each prefetch can have its own semantics.
`SelectQuery` uses `DISJOINT_PREFETCH_SEMANTICS` by default. `ObjectSelect` requires explicit semantics as we've seen above.
`SQLTemplate` and `ProcedureQuery` are both using `JOINT_PREFETCH_SEMANTICS` and it can not be changed due to the nature of those two queries.
===== Disjoint Prefetching Semantics
This semantics results in Cayenne generatiing one SQL statement for the main objects, and a separate statement for
each prefetch path (hence "disjoint" - related objects are not fetched with the main query).
Each additional SQL statement uses a qualifier of the main query plus a set of joins traversing the
prefetch path between the main and related entity.
This strategy has an advantage of efficient JVM memory use, and faster overall result processing by Cayenne,
but it requires (1+N) SQL statements to be executed, where N is the number of prefetched relationships.
===== Disjoint-by-ID Prefetching Semantics
This is a variation of disjoint prefetch where related objects are matched against a set of IDs derived from the fetched
main objects (or intermediate objects in a multi-step prefetch). Cayenne limits the size of the generated WHERE clause,
as most DBs can't parse arbitrary large SQL. So prefetch queries are broken into smaller queries.
The size of is controlled by the DI property `Constants.SERVER_MAX_ID_QUALIFIER_SIZE_PROPERTY`
(the default number of conditions in the generated WHERE clause is 10000).
Cayenne will generate (1 + N * M) SQL statements for each query using disjoint-by-ID prefetches,
where N is the number of relationships to prefetch, and M is the number of queries for a given prefetch
that is dependent on the number of objects in the result (ideally M = 1).
The advantage of this type of prefetch is that matching database rows by ID may be much faster than matching
the qualifier of the original query. Moreover this is *the only type of prefetch* that can handle SelectQueries with *fetch* limit.
Both joint and regular disjoint prefetches may produce invalid results or generate inefficient fetch-the-entire table SQL when fetch limit is in effect.
The disadvantage is that query SQL can get unwieldy for large result sets, as each object will have to have its own condition in the WHERE clause of the generated SQL.
===== Joint Prefetching Semantics
Joint semantics results in a single SQL statement for root objects and any number of jointly prefetched paths.
Cayenne processes in memory a cartesian product of the entities involved, converting it to an object tree.
It uses OUTER joins to connect prefetched entities.
Joint is the most efficient prefetch type of the three as far as generated SQL goes. There's always just 1 SQL query generated.
Its downsides are the potentially increased amount of data that needs to get across the network between the application server and the database,
and more data processing that needs to be done on the Cayenne side.
===== Similar Behaviours Using EJBQL
It is possible to achieve similar behaviours with <<EJBQLQuery>> queries by employing the "FETCH" keyword.
[source, SQL]
----
SELECT a FROM Artist a LEFT JOIN FETCH a.paintings
----
In this case, the Paintings that exist for the Artist will be obtained at the same time as the Artists are fetched.
Refer to third-party query language documentation for further detail on this mechanism.
==== Data Rows
Converting result set data to Persistent objects and registering these objects in the ObjectContext can be an expensive
operation comparable to the time spent running the query (and frequently exceeding it). Internally Cayenne builds the result as a list of DataRows,
that are later converted to objects. Skipping the last step and using data in the form of DataRows can significantly increase performance.
DataRow is a simply a map of values keyed by their DB column name. It is a ubiqutous representation of DB data used internally by Cayenne.
And it can be quite usable as is in the application in many cases. So performance sensitive selects should consider
DataRows - it saves memory and CPU cycles. All selecting queries support DataRows option, e.g.:
[source, Java]
----
ObjectSelect<DataRow> query = ObjectSelect.dataRowQuery(Artist.class);
List<DataRow> rows = query.select(context);
----
[source, Java]
----
SQLSelect<DataRow> query = SQLSelect.dataRowQuery("SELECT * FROM ARTIST");
List<DataRow> rows = query.select(context);
----
Individual DataRows may be converted to Persistent objects as needed. So e.g. you may implement some in-memory filtering, only converting a subset of fetched objects:
[source, Java]
----
// you need to cast ObjectContext to DataContext to get access to 'objectFromDataRow'
DataContext dataContext = (DataContext) context;
for(DataRow row : rows) {
if(row.get("DATE_OF_BIRTH") != null) {
Artist artist = dataContext.objectFromDataRow(Artist.class, row);
// do something with Artist...
...
}
}
----
==== Specific Attributes and Relationships with EJBQL
It is possible to fetch specific attributes and relationships from a model using <<EJBQLQuery>>.
The following example would return a java.util.List of String objects;
[source, SQL]
----
SELECT a.name FROM Artist a
----
The following will yield a java.util.List containing Object[] instances, each of which would contain the name followed by the dateOfBirth value.
[source, SQL]
----
SELECT a.name, a.dateOfBirth FROM Artist a
----
Refer to third-party query language documentation for further detail on this mechanism.
==== Iterated Queries
While contemporary hardware may easily allow applications to fetch hundreds of thousands or even millions of objects into memory,
it doesn't mean this is always a good idea to do so. You can optimize processing of very large result sets with two techniques discussed in this and the following chapter - iterated and paginated queries.
Iterated query is not actually a special query. Any selecting query can be executed in iterated mode by an ObjectContext.
ObjectContext creates an object called `ResultIterator` that is backed by an open ResultSet.
Iterator provides constant memory performance for arbitrarily large ResultSets. This is true at least on the Cayenne end,
as JDBC driver may still decide to bring the entire ResultSet into the JVM memory.
Data is read from ResultIterator one row/object at a time until it is exhausted. There are two styles of accessing
ResultIterator - direct access which requires explicit closing to avoid JDBC resources leak, or a callback that lets
Cayenne handle resource management. In both cases iteration can be performed using "for" loop, as ResultIterator is "Iterable".
Direct access. Here common sense tells us that ResultIterators instances should be processed and closed as soon as possible to release the DB connection.
E.g. storing open iterators between HTTP requests for unpredictable length of time would quickly exhaust the connection pool.
[source, Java]
----
try(ResultIterator<Artist> it = ObjectSelect.query(Artist.class).iterator(context)) {
for(Artist a : it) {
// do something with the object...
...
}
}
----
Same thing with a callback:
[source, Java]
----
ObjectSelect.query(Artist.class).iterate(context, (Artist a) -> {
// do something with the object...
...
});
----
Another example is a batch iterator that allows to process more than one object in each iteration.
This is a common scenario in various data processing jobs - read a batch of objects, process them, commit the results,
and then repeat. This allows to further optimize processing (e.g. by avoiding frequent commits).
[source, Java]
----
try(ResultBatchIterator<Artist> it = ObjectSelect.query(Artist.class).batchIterator(context, 100)) {
for(List<Artist> list : it) {
// do something with each list
...
// possibly commit your changes
context.commitChanges();
}
}
----
==== Paginated Queries
Enabling query pagination allows to load very large result sets in a Java app with very little memory overhead
(much smaller than even the DataRows option discussed above). Moreover it is completely transparent to the application -
a user gets what appears to be a list of Persistent objects - there's no iterator to close or DataRows to convert to objects:
[source, Java]
----
// the fact that result is paginated is transparent
List<Artist> artists =
ObjectSelect.query(Artist.class).pageSize(50).select(context);
----
Having said that, DataRows option can be combined with pagination, providing the best of both worlds:
[source, Java]
----
List<DataRow> rows =
ObjectSelect.dataRowQuery(Artist.class).pageSize(50).select(context);
----
The way pagination works internally, it first fetches a list of IDs for the root entity of the query. This is very
fast and initially takes very little memory. Then when an object is requested at an arbitrary index in the list,
this object and adjacent objects (a "page" of objects that is determined by the query pageSize parameter) are
fetched together by ID. Subsequent requests to the objects of this "page" are served from memory.
An obvious limitation of pagination is that if you eventually access all objects in the list, the memory use will end up
being the same as with no pagination. However it is still a very useful approach. With some lists (e.g. multi-page search results)
only a few top objects are normally accessed. At the same time pagination allows to estimate the full list size without
fetching all the objects. And again - it is completely transparent and looks like a normal query.
[[caching]]
==== Caching and Fresh Data
===== Object Caching
===== Query Result Caching
Cayenne supports mostly transparent caching of the query results. There are two levels of the cache: local
(i.e. results cached by the ObjectContext) and shared (i.e. the results cached at the stack level and shared between all contexts).
Local cache is much faster then the shared one, but is limited to a single context. It is often used with a shared read-only ObjectContext.
To take advantage of query result caching, the first step is to mark your queries appropriately.
Here is an example for ObjectSelect query. Other types of queries have similar API:
[source, Java]
----
ObjectSelect.query(Artist.class).localCache("artists");
----
This tells Cayenne that the query created here would like to use local cache of the context it is executed against.
A vararg parameter to `localCache()` (or `sharedCache()`) method contains so called "cache groups".
Those are arbitrary names that allow to categorize queries for the purpose of setting cache policies or explicit invalidation of the cache. More on that below.
The above API is enough for the caching to work, but by default your cache is an unmanaged LRU map. You can't control its size,
expiration policies, etc. For the managed cache, you will need to explicitly use one of the more advanced cache providers.
Use can use <<ext-jcache,JCache integration module>> to enable any of JCache API compatible caching providers.
Often "passive" cache expiration policies used by caching providers are not sufficient, and the users want real-time cache invalidation when the data changes.
So in addition to those policies, the app can invalidate individual cache groups explicitly with `RefreshQuery`:
[source, Java]
----
RefreshQuery refresh = new RefreshQuery("artist");
context.performGenericQuery(refresh);
----
The above can be used e.g. to build UI for manual cache invalidation.
It is also possible to automate cache refresh when certain entities are committed.
This can be done with the help of <<ext-cache-invalidation,Cache invalidation extension>>.
Finally you may cluster cache group events. They are very small and can be efficiently sent over the wire to other JVMs running Cayenne.
An example of Cayenne setup with event clustering is https://github.com/andrus/wowodc13/tree/master/services/src/main/java/demo/services/cayenne[available on GitHub].
==== Turning off Synchronization of ObjectContexts
By default when a single ObjectContext commits its changes, all other contexts in the same runtime receive an event that contains all the committed changes.
This allows them to update their cached object state to match the latest committed data. There are however many problems with this ostensibly helpful feature.
In short - it works well in environments with few contexts and in unclustered scenarios, such as single user desktop applications,
or simple webapps with only a few users. More specifically:
- The performance of synchronization is (probably worse than) O(N) where N is the number of peer ObjectContexts in the system.
In a typical webapp N can be quite large. Besides for any given context, due to locking on synchronization,
context own performance will depend not only on the queries that it runs, but also on external events that it does not control.
This is unacceptable in most situations.
- Commit events are untargeted - even contexts that do not hold a given updated object will receive the full event that they will have to process.
- Clustering between JVMs doesn't scale - apps with large volumes of commits will quickly saturate the network with events, while most of those will be thrown away on the receiving end as mentioned above.
- Some contexts may not want to be refreshed. A refresh in the middle of an operation may lead to unpredictable results.
- Synchronization will interfere with optimistic locking.
So we've made a good case for disabling synchronization in most webapps. To do that, set to "false" the following DI property -
`Constants.SERVER_CONTEXTS_SYNC_PROPERTY`, using one of the standard Cayenne DI approaches. E.g. from command line:
[source]
----
$ java -Dcayenne.server.contexts_sync_strategy=false
----
Or by changing the standard properties Map in a custom extensions module:
[source, Java]
----
public class MyModule implements Module {
@Override
public void configure(Binder binder) {
ServerModule.contributeProperties(binder)
.put(Constants.SERVER_CONTEXTS_SYNC_PROPERTY, "false");
}
}
----