e61bad5e7d11d55b8686900a23f0877f0913969c - daffodil

commit	e61bad5e7d11d55b8686900a23f0877f0913969c	[log] [tgz]
author	Steve Lawrence <slawrence@tresys.com>	Wed Jul 30 08:01:58 2014 -0400
committer	Steve Lawrence <slawrence@tresys.com>	Fri Aug 01 08:50:54 2014 -0400
tree	879a3cc175d764b187b2137100ab49a853f8e0bf
parent	0395a698cb3d0cb7acf3e4b652e085696f546fa1 [diff]

Optimize longestMatch delimiter algorithm

The longestMatch function is called everytime after delimiters are found. On
some file types, this function was a noticable hotspot.

Although not very likely, in the worst case, the current longestMatch algorithm
could go over the entire matches list 3 times: Once for finding the location of
earliest match, once for remove any matches that start later than that
location, and once for finding longest length of those that remain. In addition,
the use of various scala functions will cause new lists to be allocated,
causing slower performance.

This modifies the algorithm to only go over the matches list once, while
keeping track of the first longest match that it has seen. Additionally, all
functional code is replace with iterative code to be more performant. Lastly,
replace the Seq with an ArrayBuffer so that we are guaranteed constant append
and constant random access on the matches sequence.

This also adds an early return in the case where there is only one match, which
is probably the most likely case.

DFDL-992

3 files changed

tree: 879a3cc175d764b187b2137100ab49a853f8e0bf