openjpa-project/src/doc/manual/ref_guide_optimization.xml - openjpa - Git at Google

 <?xml version="1.0" encoding="UTF-8"?>
 <!--
  Licensed to the Apache Software Foundation (ASF) under one
  or more contributor license agreements.  See the NOTICE file
  distributed with this work for additional information
  regarding copyright ownership.  The ASF licenses this file
  to you under the Apache License, Version 2.0 (the
  "License"); you may not use this file except in compliance
  with the License.  You may obtain a copy of the License at

  http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing,
  software distributed under the License is distributed on an
  "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
  KIND, either express or implied.  See the License for the
  specific language governing permissions and limitations
  under the License.
 -->
 <chapter id="ref_guide_optimization">
     <title>
         Optimization Guidelines
     </title>
     <indexterm zone="ref_guide_optimization">
         <primary>
             optimization guidelines
         </primary>
     </indexterm>
     <para>
 There are numerous techniques you can use in order to ensure that OpenJPA
 operates in the fastest and most efficient manner. Following are some
 guidelines. Each describes what impact it will have on performance and
 scalability. Note that general guidelines regarding performance or scalability
 issues are just that - guidelines. Depending on the particular characteristics
 of your application, the optimal settings may be considerably different than
 what is outlined below.
     </para>
     <para>
 In the following table, each row is labeled with a list of italicized keywords.
 These keywords identify what characteristics the row in question may improve
 upon. Many of the rows are marked with one or both of the <emphasis>performance
 </emphasis> and <emphasis>scalability</emphasis> labels. It is important to bear
 in mind the differences between performance and scalability (for the most part,
 we are referring to system-wide scalability, and not necessarily only
 scalability within a single JVM). The performance-related hints will probably
 improve the performance of your application for a given user load, whereas the
 scalability-related hints will probably increase the total number of users that
 your application can service. Sometimes, increasing performance will decrease
 scalability, and vice versa. Typically, options that reduce the amount of work
 done on the database server will improve scalability, whereas those that push
 more work onto the server will have a negative impact on scalability.
     </para>
     <table>
         <title>
             Optimization Guidelines
         </title>
         <tgroup cols="2" align="left" colsep="1" rowsep="1">
             <colspec colname="name"/>
             <colspec colname="desc" colwidth="4*"/>
             <tbody valign="top">
                 <row>
                     <entry colname="name">
                         <emphasis role="bold">
                             Plugin in a Connection Pool
                         </emphasis>
                         <para>
 <emphasis>performance, scalability</emphasis>
                         </para>
                     </entry>
                     <entry colname="desc">
 OpenJPA's built-in datasource does not perform connection pooling or
 prepared statement caching.  Plugging in a third-party pooling datasource may
 drastically improve performance.
                     </entry>
                 </row>
                 <row>
                     <entry colname="name">
                         <emphasis role="bold">
                             Optimize database indexes
                         </emphasis>
                         <para>
 <emphasis>performance, scalability</emphasis>
                         </para>
                     </entry>
                     <entry colname="desc">
 The default set of indexes created by OpenJPA's mapping tool may not always be
 the most appropriate for your application. Manually setting indexes in your
 mapping metadata or manually manipulating database indexes to include
 frequently-queried fields (as well as dropping indexes on rarely-queried
 fields) can yield significant performance benefits.
                         <para>
 A database must do extra work on insert, update, and delete to maintain an
 index. This extra work will benefit selects with WHERE clauses, which will
 execute much faster when the terms in the WHERE clause are appropriately
 indexed. So, for a read-mostly application, appropriate indexing will slow down
 updates (which are rare) but greatly accelerate reads. This means that the
 system as a whole will be faster, and also that the database will experience
 less load, meaning that the system will be more scalable.
                         </para>
                         <para>
 Bear in mind that over-indexing is a bad thing, both for scalability and
 performance, especially for applications that perform lots of inserts, updates,
 or deletes.
                         </para>
                     </entry>
                 </row>
                 <row>
                     <entry colname="name">
                         <emphasis role="bold">
                             JVM optimizations
                         </emphasis>
                         <para>
 <emphasis>performance, reliability</emphasis>
                         </para>
                     </entry>
                     <entry colname="desc">
 Manipulating various parameters of the Java Virtual Machine (such as hotspot
 compilation modes and the maximum memory) can result in performance
 improvements. For more details about optimizing the JVM execution environment,
 please see <ulink url="http://java.sun.com/docs/hotspot/PerformanceFAQ.html"/>.
                     </entry>
                 </row>
                 <row>
                     <entry colname="name">
                         <emphasis role="bold">
                             Use the data cache
                         </emphasis>
                         <para>
 <emphasis>performance, scalability</emphasis>
                         </para>
                     </entry>
                     <entry colname="desc">
 Using OpenJPA's <link linkend="ref_guide_cache">data and query caching</link>
 features can often result in a dramatic improvement in performance.
 Additionally, these caches can significantly reduce the amount of load on
 the database, increasing the scalability characteristics of your application.
                     </entry>
                 </row>
                 <row>
                     <entry colname="name">
                         <emphasis role="bold">
                             Set <literal>LargeTransaction</literal> to true,
                             or set <literal> PopulateDataCache</literal> to
                             false
                         </emphasis>
                         <para>
 <emphasis>performance vs. scalability</emphasis>
                         </para>
                     </entry>
                     <entry colname="desc">
 When using OpenJPA's <link linkend="ref_guide_cache">data caching</link>
 features in a transaction that will delete, modify, or create a very large
 number of objects you can set <literal>LargeTransaction</literal> to true and
 perform periodic flushes during your transaction to reduce its memory
 requirements.  See the Javadoc:
 <ulink url="../javadoc/org/apache/openjpa/persistence/OpenJPAEntityManager.html">
 OpenJPAEntityManager.setTrackChangesByType</ulink>.  Note that transactions in
 large mode have to more aggressively flush items from the data cache.
                         <para>
 If your transaction will visit objects that you know are very unlikely to be
 accessed by other transactions, for example an exhaustive report run only once a
 month, you can turn off population of the data cache so that the transaction
 doesn't fill the entire data cache with objects that won't be accessed again.
 Again, see the Javadoc:
 <ulink url="../javadoc/org/apache/openjpa/persistence/OpenJPAEntityManager.html">
 OpenJPAEntityManager.setPopulateDataCache</ulink>
                         </para>
                     </entry>
                 </row>
                 <row>
                     <entry colname="name">
                         <emphasis role="bold">
                             Run the OpenJPA enhancer on your persistent classes,
                             either at build-time or deploy-time.
                         </emphasis>
                         <para>
 <emphasis>performance, scalability, memory footprint</emphasis>
                         </para>
                     </entry>
                     <entry colname="desc">
 OpenJPA performs best when your persistent classes have been run through the
 OpenJPA post-compilation bytecode enhancer. When dealing with enhanced classes,
 OpenJPA can make a number of assumptions that reduce memory footprint and
 accelerate persistent data access. When evaluating OpenJPA's performance,
 build-time or deploy-time enhancement should be enabled. See
 <xref linkend="ref_guide_pc_enhance"/> for details.
                     </entry>
                 </row>
                 <row>
                     <entry colname="name">
                         <emphasis role="bold">
                             Disable logging, performance tracking
                         </emphasis>
                         <para>
 <emphasis>performance</emphasis>
                         </para>
                     </entry>
                     <entry colname="desc">
 Developer options such as verbose logging and the JDBC performance tracker can
 result in serious performance hits for your application. Before evaluating
 OpenJPA's performance, these options should all be disabled.
                     </entry>
                 </row>
                 <row>
                     <entry colname="name">
                         <emphasis role="bold">
                             Set <literal>IgnoreChanges</literal> to true, or
                             set <literal>FlushBeforeQueries</literal> to true
                         </emphasis>
                         <para>
 <emphasis>performance vs. scalability</emphasis>
                         </para>
                     </entry>
                     <entry colname="desc">
 When both the <link linkend="openjpa.IgnoreChanges"><literal>
 openjpa.IgnoreChanges</literal></link> and
 <link linkend="openjpa.FlushBeforeQueries"><literal>openjpa.FlushBeforeQueries
 </literal></link> properties are set to false, OpenJPA needs to consider
 in-memory dirty instances during queries.  This can sometimes result in OpenJPA
 needing to evaluate the entire extent objects in order to return the correct
 query results, which can have drastic performance consequences.  If it is
 appropriate for your application, configuring <literal>FlushBeforeQueries
 </literal> to automatically flush before queries involving dirty objects will
 ensure that this never happens. Setting <literal>IgnoreChanges</literal> to
 false will result in a small performance hit even if <literal>FlushBeforeQueries
 </literal> is true, as incremental flushing is not as efficient overall as
 delaying all flushing to a single operation during commit.
                         <para>
 Setting <literal>IgnoreChanges</literal> to <literal>true</literal> will help
 performance, since dirty objects can be ignored for queries, meaning that
 incremental flushing or client-side processing is not necessary. It will also
 improve scalability, since overall database server usage is diminished. On the
 other hand, setting <literal>IgnoreChanges</literal> to <literal>false</literal>
 will have a negative impact on scalability, even when using automatic flushing
 before queries, since more operations will be performed on the database server.
                         </para>
                     </entry>
                 </row>
                 <row>
                     <entry colname="name">
                         <emphasis role="bold">
                             Configure <literal>openjpa.ConnectionRetainMode
                             </literal> appropriately
                         </emphasis>
                         <para>
 <emphasis>performance vs. scalability</emphasis>
                         </para>
                     </entry>
                     <entry colname="desc">
 The <link linkend="openjpa.ConnectionRetainMode"><literal>ConnectionRetainMode
 </literal></link> configuration option controls when OpenJPA will obtain a
 connection, and how long it will hold that connection. The optimal settings for
 this option will vary considerably depending on the particular behavior of
 your application. You may even benefit from using different retain modes for
 different parts of your application.
                         <para>
 The default setting of <literal>on-demand</literal> minimizes the amount of time
 that OpenJPA holds onto a datastore connection. This is generally the best
 option from a scalability standpoind, as database resources are held for a
 minimal amount of time. However, if you are not using connection pooling, or
 if your <classname>DataSource</classname> is not efficient at managing its
 pool, then this default value could cause undesirable pool contention.
                         </para>
                     </entry>
                 </row>
                 <row>
                     <entry colname="name">
                         <emphasis role="bold">
                             Use flat inheritance
                         </emphasis>
                         <para>
 <emphasis>performance, scalability vs. disk space</emphasis>
                         </para>
                     </entry>
                     <entry colname="desc">
 Mapping inheritance hierarchies to a single database table is faster for most
 operations than other strategies employing multiple tables. If it is
 appropriate for your application, you should use this strategy whenever
 possible.
                         <para>
 However, this strategy will require more disk space on the database side. Disk
 space is relatively inexpensive, but if your object model is particularly large,
 it can become a factor.
                         </para>
                     </entry>
                 </row>
                 <row>
                     <entry colname="name">
                         <emphasis role="bold">
                             High sequence increment
                         </emphasis>
                         <para>
 <emphasis>performance, scalability</emphasis>
                         </para>
                     </entry>
                     <entry colname="desc">
 For applications that perform large bulk inserts, the retrieval of sequence
 numbers can be a bottleneck.  Increasing sequence increments and using
 table-based rather than native database sequences can reduce or eliminate
 this bottleneck. In some cases, implementing your own sequence factory can
 further optimize sequence number retrieval.
                     </entry>
                 </row>
                 <row>
                     <entry colname="name">
                         <emphasis role="bold">
                             Use optimistic transactions
                         </emphasis>
                         <para>
 <emphasis>performance, scalability</emphasis>
                         </para>
                     </entry>
                     <entry colname="desc">
 Using datastore transactions translates into pessimistic database row locking,
 which can be a performance hit (depending on the database). If appropriate for
 your application, optimistic transactions are typically faster than datastore
 transactions.
                         <para>
 Optimistic transactions provide the same transactional guarantees as datastore
 transactions, except that you must handle a potential optimistic verification
 exception at the end of a transaction instead of assuming that a transaction
 will successfully complete. In many applications, it is unlikely that different
 concurrent transactions will operate on the same set of data at the same time,
 so optimistic verification increases the concurrency, and therefore both the
 performance and scalability characteristics, of the application. A common
 approach to handling optimistic verification exceptions is to simply present the
 end user with the fact that concurrent modifications happened, and require that
 the user redo any work.
                         </para>
                     </entry>
                 </row>
                 <row>
                     <entry colname="name">
                         <emphasis role="bold">
                             Use query aggregates and projections
                         </emphasis>
                         <para>
 <emphasis>performance, scalability</emphasis>
                         </para>
                     </entry>
                     <entry colname="desc">
 Using aggregates to compute reporting data on the database server can
 drastically speed up queries.  Similarly, using projections when you are
 interested in specific object fields or relations rather than the entire object
 state can reduce the amount of data OpenJPA must transfer from the database to
 your application.
                     </entry>
                 </row>
                 <row>
                     <entry colname="name">
                         <emphasis role="bold">
                             Always close resources
                         </emphasis>
                         <para>
 <emphasis>scalability</emphasis>
                         </para>
                     </entry>
                     <entry colname="desc">
                         <para>
 Under certain settings, <classname> EntityManager</classname> s, OpenJPA
 <classname>Extent</classname> iterators, and <classname>Query</classname>
 results may be backed by resources in the database.
                         </para>
                         <para>
 For example, if you have configured OpenJPA to use scrollable cursors and lazy
 object instantiation by default, each query result will hold open a <classname>
 ResultSet</classname> object, which, in turn, will hold open a <classname>
 Statement</classname> object (preventing it from being re-used). Garbage
 collection will clean up these resources, so it is never necessary to explicitly
 close them, but it is always faster if it is done at the application level.
                         </para>
                     </entry>
                 </row>
                 <row>
                     <entry colname="name">
                         <emphasis role="bold">
                             Use detached state managers
                         </emphasis>
                         <para>
 <emphasis>performance</emphasis>
                         </para>
                     </entry>
                     <entry colname="desc">
                         <para>
 Attaching and even persisting instances can be more efficient when your detached
 objects use detached state managers. By default, OpenJPA does not use detached
 state managers when serializing an instance across tiers. See
 <xref linkend="ref_guide_detach_graph"/> for how to force OpenJPA to use
 detached state managers across tiers, and for other options for more efficient
 attachment.
                         </para>
                         <para>
 The downside of using a detached state manager across tiers is that your
 enhanced persistent classes and the OpenJPA libraries must be available on the
 client tier.
                         </para>
                     </entry>
                 </row>
                 <row>
                     <entry colname="name">
                         <emphasis role="bold">
                             Utilize the <classname>EntityManager</classname>
                             cache
                         </emphasis>
                         <para>
 <emphasis>performance, scalability</emphasis>
                         </para>
                     </entry>
                     <entry colname="desc">
 When possible and appropriate, re-using <classname>EntityManager</classname>s
 and setting the <link linkend="openjpa.RetainState"><literal>RetainState
 </literal></link> configuration option to <literal>true</literal> may result in
 significant performance gains, since the <classname>EntityManager</classname>'s
 built-in object cache will be used.
                     </entry>
                 </row>
                 <row>
                     <entry colname="name">
                         <emphasis role="bold">
                             Enable multithreaded operation only when necessary
                         </emphasis>
                         <para>
 <emphasis>performance</emphasis>
                         </para>
                     </entry>
                     <entry colname="desc">
 OpenJPA respects the <link linkend="openjpa.Multithreaded"><literal>
 openjpa.Multithreaded</literal></link> option in that it does not impose as
 much synchronization overhead for applications that do not set this value to
 <literal>true</literal>. If your application is guaranteed to only use
 single-threaded access to OpenJPA resources and persistent objects, leaving
 this option as <literal>false</literal> will reduce synchronization overhead,
 and may result in a modest performance increase.
                     </entry>
                 </row>
                 <row>
                     <entry colname="name">
                         <emphasis role="bold">
                             Enable large data set handling
                         </emphasis>
                         <para>
 <emphasis>performance, scalability</emphasis>
                         </para>
                     </entry>
                     <entry colname="desc">
 If you execute queries that return large numbers of objects or have relations
 (collections or maps) that are large, and if you often only access parts of
 these data sets, enabling <link linkend="ref_guide_dbsetup_lrs">large result
 set handling</link> where appropriate can dramatically speed up your
 application, since OpenJPA will bring the data sets into memory from the
 database only as necessary.
                     </entry>
                 </row>
                 <row>
                     <entry colname="name">
                         <emphasis role="bold">
                             Disable large data set handling
                         </emphasis>
                         <para>
 <emphasis>performance, scalability</emphasis>
                         </para>
                     </entry>
                     <entry colname="desc">
 If you have enabled scrollable result sets and on-demand loading but do you not
 require it, consider disabling it again.  Some JDBC drivers and databases
 (SQLServer for example) are much slower when used with scrolling result sets.
                     </entry>
                 </row>
                 <row>
                     <entry colname="name">
                         <emphasis role="bold">
                             Use the <classname>DynamicSchemaFactory</classname>
                         </emphasis>
                         <para>
 <emphasis>performance, validation</emphasis>
                         </para>
                     </entry>
                     <entry colname="desc">
 If you are using a <link linkend="openjpa.jdbc.SchemaFactory"><literal>
 openjpa.jdbc.SchemaFactory</literal></link> setting of something other than
 the default of <literal>dynamic</literal>, consider switching back.  While other
 factories can ensure that object-relational mapping information is valid when
 a persistent class is first used, this can be a slow process.  Though the
 validation is only performed once for each class, switching back to the
 <classname>DynamicSchemaFactory</classname> can reduce the warm-up time for
 your application.
                     </entry>
                 </row>
                 <row>
                     <entry colname="name">
                         <emphasis role="bold">
                             Do not use XA transactions
                         </emphasis>
                         <para>
 <emphasis>performance, scalability</emphasis>
                         </para>
                     </entry>
                     <entry colname="desc">
 <link linkend="ref_guide_enterprise_xa">XA transactions</link> can be orders of
 magnitude slower than standard transactions. Unless distributed transaction
 functionality is required by your application, use standard transactions.
                         <para>
 Recall that XA transactions are distinct from managed transactions - managed
 transaction services such as that provided by EJB declarative transactions can
 be used both with XA and non-XA transactions. XA transactions should only be
 used when a given business transaction involves multiple different transactional
 resources (an Oracle database and an IBM transactional message queue, for
 example).
                         </para>
                     </entry>
                 </row>
                 <row>
                     <entry colname="name">
                         <emphasis role="bold">
                             Use <classname>Set</classname>s instead of
                             <classname>List/Collection</classname>s
                         </emphasis>
                         <para>
 <emphasis>performance, scalability</emphasis>
                         </para>
                     </entry>
                     <entry colname="desc">
 There is a small amount of extra overhead for OpenJPA to maintain collections
 where each element is not guaranteed to be unique.  If your application does
 not require duplicates for a collection, you should always declare your
 fields to be of type <classname>Set, SortedSet, HashSet,</classname> or
 <classname>TreeSet</classname>.
                     </entry>
                 </row>
 				<row>
 					<entry colname="name">
                         <emphasis role="bold">
                             Use query parameters instead of encoding search
                             data in filter strings
                         </emphasis>
                         <para>
 <emphasis>performance</emphasis>
                         </para>
 					</entry>
 					<entry colname="desc">
 If your queries depend on parameter data only known at runtime, you should use
 query parameters rather than dynamically building different query strings.
 OpenJPA performs aggressive caching of query compilation data, and the
 effectiveness of this cache is diminished if multiple query filters are used
 where a single one could have sufficed.
 					</entry>
 				</row>
                 <row>
                     <entry colname="name">
                         <emphasis role="bold">
                             Tune your fetch groups appropriately
                         </emphasis>
                         <para>
 <emphasis>performance, scalability</emphasis>
                         </para>
                     </entry>
                     <entry colname="desc">
 The <link linkend="ref_guide_fetch">fetch groups</link> used when loading an
 object control how much data is eagerly loaded, and by extension, which fields
 must be lazily loaded at a future time. The ideal fetch group configuration
 loads all the data that is needed in one fetch, and no extra fields - this
 minimizes both the amount of data transferred from the database, and the
 number of trips to the database.
                         <para>
 If extra fields are specified in the fetch groups (in particular, large fields
 such as binary data, or relations to other persistence-capable objects), then
 network overhead (for the extra data) and database processing (for any necessary
 additional joins) will hurt your application's performance. If too few fields
 are specified in the fetch groups, then OpenJPA will have to make additional
 trips to the database to load additional fields as necessary.
                         </para>
                     </entry>
                 </row>
                 <row>
                     <entry colname="name">
                         <emphasis role="bold">
                             Use eager fetching
                         </emphasis>
                         <para>
 <emphasis>performance, scalability</emphasis>
                         </para>
                     </entry>
                     <entry colname="desc">
 Using <link linkend="ref_guide_perfpack_eager">eager fetching</link> when
 loading subclass data or traversing relations for each instance in a large
 collection of results can speed up data loading by orders of magnitude.
                     </entry>
                 </row>
                 <row>
                     <entry colname="name">
                         <emphasis role="bold">
                             Disable BrokerImpl finalization
                         </emphasis>
                         <para>
 <emphasis>performance, scalability</emphasis>
                         </para>
                     </entry>
                     <entry colname="desc">
 Outside of a Java EE 5 application server or other JPA persistence container,
 OpenJPA's EntityManagers use finalizers to ensure that resources
 get cleaned up. If you are properly managing your resources, this finalization
 is not necessary, and will introduce unneeded synchronization, leading to
 scalability problems. You can disable this protective behavior by setting the
 <literal>openjpa.BrokerImpl</literal> property to
 <literal>non-finalizing</literal>. See <xref
 linkend="ref_guide_runtime_broker_finalization"/> for details.
                     </entry>
                 </row>
             </tbody>
         </tgroup>
     </table>
 </chapter>