| <?xml version="1.0" encoding="UTF-8"?> |
| <!-- |
| Licensed to the Apache Software Foundation (ASF) under one |
| or more contributor license agreements. See the NOTICE file |
| distributed with this work for additional information |
| regarding copyright ownership. The ASF licenses this file |
| to you under the Apache License, Version 2.0 (the |
| "License"); you may not use this file except in compliance |
| with the License. You may obtain a copy of the License at |
| |
| http://www.apache.org/licenses/LICENSE-2.0 |
| |
| Unless required by applicable law or agreed to in writing, |
| software distributed under the License is distributed on an |
| "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY |
| KIND, either express or implied. See the License for the |
| specific language governing permissions and limitations |
| under the License. |
| --> |
| <chapter id="ref_guide_optimization"> |
| <title> |
| Optimization Guidelines |
| </title> |
| <indexterm zone="ref_guide_optimization"> |
| <primary> |
| optimization guidelines |
| </primary> |
| </indexterm> |
| <para> |
| There are numerous techniques you can use in order to ensure that OpenJPA |
| operates in the fastest and most efficient manner. Following are some |
| guidelines. Each describes what impact it will have on performance and |
| scalability. Note that general guidelines regarding performance or scalability |
| issues are just that - guidelines. Depending on the particular characteristics |
| of your application, the optimal settings may be considerably different than |
| what is outlined below. |
| </para> |
| <para> |
| In the following table, each row is labeled with a list of italicized keywords. |
| These keywords identify what characteristics the row in question may improve |
| upon. Many of the rows are marked with one or both of the <emphasis>performance |
| </emphasis> and <emphasis>scalability</emphasis> labels. It is important to bear |
| in mind the differences between performance and scalability (for the most part, |
| we are referring to system-wide scalability, and not necessarily only |
| scalability within a single JVM). The performance-related hints will probably |
| improve the performance of your application for a given user load, whereas the |
| scalability-related hints will probably increase the total number of users that |
| your application can service. Sometimes, increasing performance will decrease |
| scalability, and vice versa. Typically, options that reduce the amount of work |
| done on the database server will improve scalability, whereas those that push |
| more work onto the server will have a negative impact on scalability. |
| </para> |
| <table> |
| <title> |
| Optimization Guidelines |
| </title> |
| <tgroup cols="2" align="left" colsep="1" rowsep="1"> |
| <colspec colname="name"/> |
| <colspec colname="desc" colwidth="4*"/> |
| <tbody valign="top"> |
| <row> |
| <entry colname="name"> |
| <emphasis role="bold"> |
| Use a connection pool |
| </emphasis> |
| <para> |
| <emphasis>performance, scalability</emphasis> |
| </para> |
| </entry> |
| <entry colname="desc"> |
| OpenJPA's built-in datasource does not perform connection pooling or |
| prepared statement caching, but it can use Apache Commons DBCP for connection |
| pooling if it is provided on the classpath. Check out the |
| <link linkend="ref_guide_dbsetup_builtin">DriverDataSource</link> |
| section, which describes how to use and configure Commons DBCP. |
| Also, you can manually plug in a third-party pooling datasource like |
| <link linkend="ref_guide_integration_dbcp">Apache Commons DBCP</link>, |
| included in the binary distribution and openjpa-all artifact, which may |
| drastically improve application performance. |
| </entry> |
| </row> |
| <row> |
| <entry colname="name"> |
| <emphasis role="bold"> |
| Optimize database indexes |
| </emphasis> |
| <para> |
| <emphasis>performance, scalability</emphasis> |
| </para> |
| </entry> |
| <entry colname="desc"> |
| The default set of indexes created by OpenJPA's mapping tool may not always be |
| the most appropriate for your application. Manually setting indexes in your |
| mapping metadata or manually manipulating database indexes to include |
| frequently-queried fields (as well as dropping indexes on rarely-queried |
| fields) can yield significant performance benefits. |
| <para> |
| A database must do extra work on insert, update, and delete to maintain an |
| index. This extra work will benefit selects with WHERE clauses, which will |
| execute much faster when the terms in the WHERE clause are appropriately |
| indexed. So, for a read-mostly application, appropriate indexing will slow down |
| updates (which are rare) but greatly accelerate reads. This means that the |
| system as a whole will be faster, and also that the database will experience |
| less load, meaning that the system will be more scalable. |
| </para> |
| <para> |
| Bear in mind that over-indexing is a bad thing, both for scalability and |
| performance, especially for applications that perform lots of inserts, updates, |
| or deletes. |
| </para> |
| </entry> |
| </row> |
| <row> |
| <entry colname="name"> |
| <emphasis role="bold"> |
| JVM optimizations |
| </emphasis> |
| <para> |
| <emphasis>performance, reliability</emphasis> |
| </para> |
| </entry> |
| <entry colname="desc"> |
| Manipulating various parameters of the Java Virtual Machine (such as hotspot |
| compilation modes and the maximum memory) can result in performance |
| improvements. For more details about optimizing the JVM execution environment, |
| please see <ulink url="http://www.oracle.com/technetwork/java/hotspotfaq-138619.html"/>. |
| </entry> |
| </row> |
| <row> |
| <entry colname="name"> |
| <emphasis role="bold"> |
| Use the data cache |
| </emphasis> |
| <para> |
| <emphasis>performance, scalability</emphasis> |
| </para> |
| </entry> |
| <entry colname="desc"> |
| Using OpenJPA's <link linkend="ref_guide_cache">data and query caching</link> |
| features can often result in a dramatic improvement in performance. |
| Additionally, these caches can significantly reduce the amount of load on |
| the database, increasing the scalability characteristics of your application. |
| </entry> |
| </row> |
| <row> |
| <entry colname="name"> |
| <emphasis role="bold"> |
| Set <literal>LargeTransaction</literal> to true, |
| or set <literal> PopulateDataCache</literal> to |
| false |
| </emphasis> |
| <para> |
| <emphasis>performance vs. scalability</emphasis> |
| </para> |
| </entry> |
| <entry colname="desc"> |
| When using OpenJPA's <link linkend="ref_guide_cache">data caching</link> |
| features in a transaction that will delete, modify, or create a very large |
| number of objects you can set <literal>LargeTransaction</literal> to true and |
| perform periodic flushes during your transaction to reduce its memory |
| requirements. See the Javadoc: |
| <ulink url="../../apidocs/org/apache/openjpa/persistence/OpenJPAEntityManager.html"> |
| OpenJPAEntityManager.setTrackChangesByType</ulink>. Note that transactions in |
| large mode have to more aggressively flush items from the data cache. |
| <para> |
| If your transaction will visit objects that you know are very unlikely to be |
| accessed by other transactions, for example an exhaustive report run only once a |
| month, you can turn off population of the data cache so that the transaction |
| doesn't fill the entire data cache with objects that won't be accessed again. |
| Again, see the Javadoc: |
| <ulink url="../../apidocs/org/apache/openjpa/persistence/OpenJPAEntityManager.html"> |
| OpenJPAEntityManager.setPopulateDataCache</ulink> |
| </para> |
| </entry> |
| </row> |
| <row> |
| <entry colname="name"> |
| <emphasis role="bold"> |
| Run the OpenJPA enhancer on your persistent classes, |
| either at build-time or deploy-time. |
| </emphasis> |
| <para> |
| <emphasis>performance, scalability, memory footprint</emphasis> |
| </para> |
| </entry> |
| <entry colname="desc"> |
| OpenJPA performs best when your persistent classes have been run through the |
| OpenJPA post-compilation bytecode enhancer. When dealing with enhanced classes, |
| OpenJPA can make a number of assumptions that reduce memory footprint and |
| accelerate persistent data access. When evaluating OpenJPA's performance, |
| build-time or deploy-time enhancement should be enabled. See |
| <xref linkend="ref_guide_pc_enhance"/> for details. |
| </entry> |
| </row> |
| <row> |
| <entry colname="name"> |
| <emphasis role="bold"> |
| Disable logging, performance tracking |
| </emphasis> |
| <para> |
| <emphasis>performance</emphasis> |
| </para> |
| </entry> |
| <entry colname="desc"> |
| Developer options such as verbose logging and the JDBC performance tracker can |
| result in serious performance hits for your application. Before evaluating |
| OpenJPA's performance, these options should all be disabled. |
| </entry> |
| </row> |
| <row> |
| <entry colname="name"> |
| <emphasis role="bold"> |
| Set <literal>IgnoreChanges</literal> to true, or |
| set <literal>FlushBeforeQueries</literal> to true |
| </emphasis> |
| <para> |
| <emphasis>performance vs. scalability</emphasis> |
| </para> |
| </entry> |
| <entry colname="desc"> |
| When both the <link linkend="openjpa.IgnoreChanges"><literal> |
| openjpa.IgnoreChanges</literal></link> and |
| <link linkend="openjpa.FlushBeforeQueries"><literal>openjpa.FlushBeforeQueries |
| </literal></link> properties are set to false, OpenJPA needs to consider |
| in-memory dirty instances during queries. This can sometimes result in OpenJPA |
| needing to evaluate the entire extent objects in order to return the correct |
| query results, which can have drastic performance consequences. If it is |
| appropriate for your application, configuring <literal>FlushBeforeQueries |
| </literal> to automatically flush before queries involving dirty objects will |
| ensure that this never happens. Setting <literal>IgnoreChanges</literal> to |
| false will result in a small performance hit even if <literal>FlushBeforeQueries |
| </literal> is true, as incremental flushing is not as efficient overall as |
| delaying all flushing to a single operation during commit. |
| <para> |
| Setting <literal>IgnoreChanges</literal> to <literal>true</literal> will help |
| performance, since dirty objects can be ignored for queries, meaning that |
| incremental flushing or client-side processing is not necessary. It will also |
| improve scalability, since overall database server usage is diminished. On the |
| other hand, setting <literal>IgnoreChanges</literal> to <literal>false</literal> |
| will have a negative impact on scalability, even when using automatic flushing |
| before queries, since more operations will be performed on the database server. |
| </para> |
| </entry> |
| </row> |
| <row> |
| <entry colname="name"> |
| <emphasis role="bold"> |
| Configure <literal>openjpa.ConnectionRetainMode |
| </literal> appropriately |
| </emphasis> |
| <para> |
| <emphasis>performance vs. scalability</emphasis> |
| </para> |
| </entry> |
| <entry colname="desc"> |
| The <link linkend="openjpa.ConnectionRetainMode"><literal>ConnectionRetainMode |
| </literal></link> configuration option controls when OpenJPA will obtain a |
| connection, and how long it will hold that connection. The optimal settings for |
| this option will vary considerably depending on the particular behavior of |
| your application. You may even benefit from using different retain modes for |
| different parts of your application. |
| <para> |
| The default setting of <literal>on-demand</literal> minimizes the amount of time |
| that OpenJPA holds onto a datastore connection. This is generally the best |
| option from a scalability standpoint, as database resources are held for a |
| minimal amount of time. However, if you are not using connection pooling, or |
| if your <classname>DataSource</classname> is not efficient at managing its |
| pool, then this default value could cause undesirable pool contention. |
| </para> |
| </entry> |
| </row> |
| <row> |
| <entry colname="name"> |
| <emphasis role="bold"> |
| Use flat inheritance |
| </emphasis> |
| <para> |
| <emphasis>performance, scalability vs. disk space</emphasis> |
| </para> |
| </entry> |
| <entry colname="desc"> |
| Mapping inheritance hierarchies to a single database table is faster for most |
| operations than other strategies employing multiple tables. If it is |
| appropriate for your application, you should use this strategy whenever |
| possible. |
| <para> |
| However, this strategy will require more disk space on the database side. Disk |
| space is relatively inexpensive, but if your object model is particularly large, |
| it can become a factor. |
| </para> |
| </entry> |
| </row> |
| <row> |
| <entry colname="name"> |
| <emphasis role="bold"> |
| High sequence increment |
| </emphasis> |
| <para> |
| <emphasis>performance, scalability</emphasis> |
| </para> |
| </entry> |
| <entry colname="desc"> |
| For applications that perform large bulk inserts, the retrieval of sequence |
| numbers can be a bottleneck. Increasing sequence allocation sizes can reduce or eliminate |
| this bottleneck. In some cases, implementing your own sequence factory can |
| further optimize sequence number retrieval. |
| </entry> |
| </row> |
| <row> |
| <entry colname="name"> |
| <emphasis role="bold"> |
| Use optimistic transactions |
| </emphasis> |
| <para> |
| <emphasis>performance, scalability</emphasis> |
| </para> |
| </entry> |
| <entry colname="desc"> |
| Using datastore transactions translates into pessimistic database row locking, |
| which can be a performance hit (depending on the database). If appropriate for |
| your application, optimistic transactions are typically faster than datastore |
| transactions. |
| <para> |
| Optimistic transactions provide the same transactional guarantees as datastore |
| transactions, except that you must handle a potential optimistic verification |
| exception at the end of a transaction instead of assuming that a transaction |
| will successfully complete. In many applications, it is unlikely that different |
| concurrent transactions will operate on the same set of data at the same time, |
| so optimistic verification increases the concurrency, and therefore both the |
| performance and scalability characteristics, of the application. A common |
| approach to handling optimistic verification exceptions is to simply present the |
| end user with the fact that concurrent modifications happened, and require that |
| the user redo any work. |
| </para> |
| </entry> |
| </row> |
| <row> |
| <entry colname="name"> |
| <emphasis role="bold"> |
| Use query aggregates and projections |
| </emphasis> |
| <para> |
| <emphasis>performance, scalability</emphasis> |
| </para> |
| </entry> |
| <entry colname="desc"> |
| Using aggregates to compute reporting data on the database server can |
| drastically speed up queries. Similarly, using projections when you are |
| interested in specific object fields or relations rather than the entire object |
| state can reduce the amount of data OpenJPA must transfer from the database to |
| your application. |
| </entry> |
| </row> |
| <row> |
| <entry colname="name"> |
| <emphasis role="bold"> |
| Always close resources |
| </emphasis> |
| <para> |
| <emphasis>scalability</emphasis> |
| </para> |
| </entry> |
| <entry colname="desc"> |
| <para> |
| Under certain settings, <classname> EntityManager</classname> s, OpenJPA |
| <classname>Extent</classname> iterators, and <classname>Query</classname> |
| results may be backed by resources in the database. |
| </para> |
| <para> |
| For example, if you have configured OpenJPA to use scrollable cursors and lazy |
| object instantiation by default, each query result will hold open a <classname> |
| ResultSet</classname> object, which, in turn, will hold open a <classname> |
| Statement</classname> object (preventing it from being re-used). Garbage |
| collection will clean up these resources, so it is never necessary to explicitly |
| close them, but it is always faster if it is done at the application level. |
| </para> |
| </entry> |
| </row> |
| <row> |
| <entry colname="name"> |
| <emphasis role="bold"> |
| Use detached state managers |
| </emphasis> |
| <para> |
| <emphasis>performance</emphasis> |
| </para> |
| </entry> |
| <entry colname="desc"> |
| <para> |
| Attaching and even persisting instances can be more efficient when your detached |
| objects use detached state managers. By default, OpenJPA does not use detached |
| state managers when serializing an instance across tiers. See |
| <xref linkend="ref_guide_detach_graph"/> for how to force OpenJPA to use |
| detached state managers across tiers, and for other options for more efficient |
| attachment. |
| </para> |
| <para> |
| The downside of using a detached state manager across tiers is that your |
| enhanced persistent classes and the OpenJPA libraries must be available on the |
| client tier. |
| </para> |
| </entry> |
| </row> |
| <row> |
| <entry colname="name"> |
| <emphasis role="bold"> |
| Utilize the <classname>EntityManager</classname> |
| cache |
| </emphasis> |
| <para> |
| <emphasis>performance, scalability</emphasis> |
| </para> |
| </entry> |
| <entry colname="desc"> |
| When possible and appropriate, re-using <classname>EntityManager</classname>s |
| and setting the <link linkend="openjpa.RetainState"><literal>RetainState |
| </literal></link> configuration option to <literal>true</literal> may result in |
| significant performance gains, since the <classname>EntityManager</classname>'s |
| built-in object cache will be used. |
| </entry> |
| </row> |
| <row> |
| <entry colname="name"> |
| <emphasis role="bold"> |
| Enable multithreaded operation only when necessary |
| </emphasis> |
| <para> |
| <emphasis>performance</emphasis> |
| </para> |
| </entry> |
| <entry colname="desc"> |
| OpenJPA respects the <link linkend="openjpa.Multithreaded"><literal> |
| openjpa.Multithreaded</literal></link> option in that it does not impose as |
| much synchronization overhead for applications that do not set this value to |
| <literal>true</literal>. If your application is guaranteed to only use |
| single-threaded access to OpenJPA resources and persistent objects, leaving |
| this option as <literal>false</literal> will reduce synchronization overhead, |
| and may result in a modest performance increase. |
| </entry> |
| </row> |
| <row> |
| <entry colname="name"> |
| <emphasis role="bold"> |
| Enable large data set handling |
| </emphasis> |
| <para> |
| <emphasis>performance, scalability</emphasis> |
| </para> |
| </entry> |
| <entry colname="desc"> |
| If you execute queries that return large numbers of objects or have relations |
| (collections or maps) that are large, and if you often only access parts of |
| these data sets, enabling <link linkend="ref_guide_dbsetup_lrs">large result |
| set handling</link> where appropriate can dramatically speed up your |
| application, since OpenJPA will bring the data sets into memory from the |
| database only as necessary. |
| </entry> |
| </row> |
| <row> |
| <entry colname="name"> |
| <emphasis role="bold"> |
| Disable large data set handling |
| </emphasis> |
| <para> |
| <emphasis>performance, scalability</emphasis> |
| </para> |
| </entry> |
| <entry colname="desc"> |
| If you have enabled scrollable result sets and on-demand loading but you do not |
| require it, consider disabling it again. Some JDBC drivers and databases |
| (SQL Server for example) are much slower when used with scrolling result sets. |
| </entry> |
| </row> |
| <row> |
| <entry colname="name"> |
| <emphasis role="bold"> |
| Use the <classname>DynamicSchemaFactory</classname> |
| </emphasis> |
| <para> |
| <emphasis>performance, validation</emphasis> |
| </para> |
| </entry> |
| <entry colname="desc"> |
| If you are using an <link linkend="openjpa.jdbc.SchemaFactory"><literal> |
| openjpa.jdbc.SchemaFactory</literal></link> setting of something other than |
| the default of <literal>dynamic</literal>, consider switching back. While other |
| factories can ensure that object-relational mapping information is valid when |
| a persistent class is first used, this can be a slow process. Though the |
| validation is only performed once for each class, switching back to the |
| <classname>DynamicSchemaFactory</classname> can reduce the warm-up time for |
| your application. |
| </entry> |
| </row> |
| <row> |
| <entry colname="name"> |
| <emphasis role="bold"> |
| Do not use XA transactions |
| </emphasis> |
| <para> |
| <emphasis>performance, scalability</emphasis> |
| </para> |
| </entry> |
| <entry colname="desc"> |
| <link linkend="ref_guide_enterprise_xa">XA transactions</link> can be orders of |
| magnitude slower than standard transactions. Unless distributed transaction |
| functionality is required by your application, use standard transactions. |
| <para> |
| Recall that XA transactions are distinct from managed transactions - managed |
| transaction services such as that provided by EJB declarative transactions can |
| be used both with XA and non-XA transactions. XA transactions should only be |
| used when a given business transaction involves multiple different transactional |
| resources (an Oracle database and an IBM transactional message queue, for |
| example). |
| </para> |
| </entry> |
| </row> |
| <row> |
| <entry colname="name"> |
| <emphasis role="bold"> |
| Use <classname>Set</classname>s instead of |
| <classname>List/Collection</classname>s |
| </emphasis> |
| <para> |
| <emphasis>performance, scalability</emphasis> |
| </para> |
| </entry> |
| <entry colname="desc"> |
| There is a small amount of extra overhead for OpenJPA to maintain collections |
| where each element is not guaranteed to be unique. If your application does |
| not require duplicates for a collection, you should always declare your |
| fields to be of type <classname>Set, SortedSet, HashSet,</classname> or |
| <classname>TreeSet</classname>. |
| </entry> |
| </row> |
| <row> |
| <entry colname="name"> |
| <emphasis role="bold"> |
| Use query parameters instead of encoding search |
| data in filter strings |
| </emphasis> |
| <para> |
| <emphasis>performance</emphasis> |
| </para> |
| </entry> |
| <entry colname="desc"> |
| If your queries depend on parameter data only known at runtime, you should use |
| query parameters rather than dynamically building different query strings. |
| OpenJPA performs aggressive caching of query compilation data, and the |
| effectiveness of this cache is diminished if multiple query filters are used |
| where a single one could have sufficed. |
| </entry> |
| </row> |
| <row> |
| <entry colname="name"> |
| <emphasis role="bold"> |
| Tune your fetch groups appropriately |
| </emphasis> |
| <para> |
| <emphasis>performance, scalability</emphasis> |
| </para> |
| </entry> |
| <entry colname="desc"> |
| The <link linkend="ref_guide_fetch">fetch groups</link> used when loading an |
| object control how much data is eagerly loaded, and by extension, which fields |
| must be lazily loaded at a future time. The ideal fetch group configuration |
| loads all the data that is needed in one fetch, and no extra fields - this |
| minimizes both the amount of data transferred from the database, and the |
| number of trips to the database. |
| <para> |
| If extra fields are specified in the fetch groups (in particular, large fields |
| such as binary data, or relations to other persistence-capable objects), then |
| network overhead (for the extra data) and database processing (for any necessary |
| additional joins) will hurt your application's performance. If too few fields |
| are specified in the fetch groups, then OpenJPA will have to make additional |
| trips to the database to load additional fields as necessary. |
| </para> |
| </entry> |
| </row> |
| <row> |
| <entry colname="name"> |
| <emphasis role="bold"> |
| Use eager fetching |
| </emphasis> |
| <para> |
| <emphasis>performance, scalability</emphasis> |
| </para> |
| </entry> |
| <entry colname="desc"> |
| Using <link linkend="ref_guide_perfpack_eager">eager fetching</link> when |
| loading subclass data or traversing relations for each instance in a large |
| collection of results can speed up data loading by orders of magnitude. |
| </entry> |
| </row> |
| <row> |
| <entry colname="name"> |
| <emphasis role="bold"> |
| Disable BrokerImpl finalization |
| </emphasis> |
| <para> |
| <emphasis>performance, scalability</emphasis> |
| </para> |
| </entry> |
| <entry colname="desc"> |
| Outside of a Java EE application server or other JPA persistence container, |
| OpenJPA's EntityManagers use finalizers to ensure that resources |
| get cleaned up. If you are properly managing your resources, this finalization |
| is not necessary, and will introduce unneeded synchronization, leading to |
| scalability problems. You can disable this protective behavior by setting the |
| <literal>openjpa.BrokerImpl</literal> property to |
| <literal>non-finalizing</literal>. See <xref |
| linkend="ref_guide_runtime_broker_finalization"/> for details. |
| </entry> |
| </row> |
| <row> |
| <entry colname="name"> |
| <emphasis role="bold"> |
| Preload MetaDataRepository |
| </emphasis> |
| <para> |
| <emphasis>scalability</emphasis> |
| </para> |
| </entry> |
| <entry colname="desc"> |
| By default, the MetaDataRepository is lazily loaded which means that fair amounts of locking |
| is used to ensure that metadata is processed properly. Enabling preloading allows OpenJPA to |
| load metadata upfront and remove locking. See <xref linkend="ref_guide_meta_repository"/> for details. |
| </entry> |
| </row> |
| </tbody> |
| </tgroup> |
| </table> |
| </chapter> |