commit	8ef274afa50a08d694b6e7c7912727fcffd41100	[log] [tgz]
author	Arne Bernhardt <arne.bernhardt@soptim.de>	Tue May 23 12:01:16 2023 +0200
committer	Arne Bernhardt <arne.bernhardt@soptim.de>	Sun Jun 04 12:19:36 2023 +0200
tree	c785a9dea4cb08b10deeb255531111719a3674a0
parent	ba2cdb6211f60d7725d779a8556266a8c0b2d0c5 [diff]

GH-1279: Improve the implementation of in-memory, general-purpose,non-transactional graphs.

Summary:
- Improved performance of GraphMem for:
  - Graph#find
    - Slightly by tuning the filter predicates
    - Significantly when results are processed via Iterator#forEachRemaining
  - Graph#stream
    - Slightly by tuning the filter predicates
    - Significantly for most stream operations
    - java.util.stream.BaseStream#parallel is now fully supported.
      (Before these changes, parallel execution was even slower than single-threaded execution)
  - Graph#contains
    - Only for non-concrete (fluent) patterns
- GraphMem used a variety of different Iterator implementations, helper classes, and wrappers:
  -> None of them supported #forEachRemaining
  -> It seemed appropriate to almost universally implement #forEachRemaining to avoid a wrapper or helper
     from breaking the newly gained performance advantages of #forEachRemaining

Details:

Implemented Iterator#forEachRemaining in an optimized way for many iterators throughout the Jena project:
- Replaced hasNext();next(); calls by #forEachRemaining in some promising places (not all)

Optimized NiceIterator:
- NiceIterator#hasNext now avoids redundant calls to current.hasNext()
- NiceIterator#andThen has optimized code to handle expensive hasNext calls of wrapped iterators.

Tuned GraphMem:
- Removed unused classes in the 'mem' namespace.
- Iterators:
  - Tuned Iterator implementations, mainly by adding forEachRemaining implementations
  - Optimized code in BasicKeyIterator. The new iterator works in reverse order, so I had to adapt HashCommon#removeFrom
  - Optimized code in HashedBunchMap#iterator
  - TrackingTripleIterator#forEachRemaining now simply calls super.forEachRemaining() and sets current to null
    -> This had a significant impact on performance
- Spliterators:
  - Created specialized spliterators SparseArraySpliterator and SparseArraySubSpliterator (+unit tests)
  - Replaced Spliterator implementation within HashCommon and HashedBunchMap with SparseArraySpliterator
  - Implemented Spliterators to support fast stream operations, where the SparseSpliterator*
    implementations were not suitable
- Replaced usage of org.apache.jena.graph.impl.GraphBase#containsByFind with optimized implementations:
  - Introduced org.apache.jena.graph.impl.TripleStore#containsMatch
  - Implemented org.apache.jena.mem.NodeToTriplesMapBase#containsMatch using the spliterator

- Filter operations:
  - Introduced org.apache.jena.graph.Triple.Field#filterOnConcrete to avoid double-checking of Node#isConcrete (+unit tests)
  - Created org.apache.jena.mem.FieldFilter to efficiently build filter predicates only when needed, and only with
    the required conditions (+unit tests)
  - Used org.apache.jena.mem.FieldFilter#filterOn in NodeToTriplesMap#iterator, NodeToTriplesMapBase#stream,
    and NodeToTriplesMapBase#containsMatch to only filter when a filter is needed.
    For example: For find(sub, ANY, ANY), there is no need for a filter in the underlying TripleBunch

74 files changed

tree: c785a9dea4cb08b10deeb255531111719a3674a0

README.md

Jena README

Welcome to Apache Jena, a Java framework for writing Semantic Web applications.

See https://jena.apache.org/ for the project website, including documentation.

The codebase for the active modules is in git:

https://github.com/apache/jena