commit | 8ef274afa50a08d694b6e7c7912727fcffd41100 | [log] [tgz] |
---|---|---|
author | Arne Bernhardt <arne.bernhardt@soptim.de> | Tue May 23 12:01:16 2023 +0200 |
committer | Arne Bernhardt <arne.bernhardt@soptim.de> | Sun Jun 04 12:19:36 2023 +0200 |
tree | c785a9dea4cb08b10deeb255531111719a3674a0 | |
parent | ba2cdb6211f60d7725d779a8556266a8c0b2d0c5 [diff] |
GH-1279: Improve the implementation of in-memory, general-purpose,non-transactional graphs. Summary: - Improved performance of GraphMem for: - Graph#find - Slightly by tuning the filter predicates - Significantly when results are processed via Iterator#forEachRemaining - Graph#stream - Slightly by tuning the filter predicates - Significantly for most stream operations - java.util.stream.BaseStream#parallel is now fully supported. (Before these changes, parallel execution was even slower than single-threaded execution) - Graph#contains - Only for non-concrete (fluent) patterns - GraphMem used a variety of different Iterator implementations, helper classes, and wrappers: -> None of them supported #forEachRemaining -> It seemed appropriate to almost universally implement #forEachRemaining to avoid a wrapper or helper from breaking the newly gained performance advantages of #forEachRemaining Details: Implemented Iterator#forEachRemaining in an optimized way for many iterators throughout the Jena project: - Replaced hasNext();next(); calls by #forEachRemaining in some promising places (not all) Optimized NiceIterator: - NiceIterator#hasNext now avoids redundant calls to current.hasNext() - NiceIterator#andThen has optimized code to handle expensive hasNext calls of wrapped iterators. Tuned GraphMem: - Removed unused classes in the 'mem' namespace. - Iterators: - Tuned Iterator implementations, mainly by adding forEachRemaining implementations - Optimized code in BasicKeyIterator. The new iterator works in reverse order, so I had to adapt HashCommon#removeFrom - Optimized code in HashedBunchMap#iterator - TrackingTripleIterator#forEachRemaining now simply calls super.forEachRemaining() and sets current to null -> This had a significant impact on performance - Spliterators: - Created specialized spliterators SparseArraySpliterator and SparseArraySubSpliterator (+unit tests) - Replaced Spliterator implementation within HashCommon and HashedBunchMap with SparseArraySpliterator - Implemented Spliterators to support fast stream operations, where the SparseSpliterator* implementations were not suitable - Replaced usage of org.apache.jena.graph.impl.GraphBase#containsByFind with optimized implementations: - Introduced org.apache.jena.graph.impl.TripleStore#containsMatch - Implemented org.apache.jena.mem.NodeToTriplesMapBase#containsMatch using the spliterator - Filter operations: - Introduced org.apache.jena.graph.Triple.Field#filterOnConcrete to avoid double-checking of Node#isConcrete (+unit tests) - Created org.apache.jena.mem.FieldFilter to efficiently build filter predicates only when needed, and only with the required conditions (+unit tests) - Used org.apache.jena.mem.FieldFilter#filterOn in NodeToTriplesMap#iterator, NodeToTriplesMapBase#stream, and NodeToTriplesMapBase#containsMatch to only filter when a filter is needed. For example: For find(sub, ANY, ANY), there is no need for a filter in the underlying TripleBunch
Welcome to Apache Jena, a Java framework for writing Semantic Web applications.
See https://jena.apache.org/ for the project website, including documentation.
The codebase for the active modules is in git: