[tablet] minor optimization on DeltaFileIterator

This patch reduces the number of objects allocated on the heap
by DeltaFileIterator<Type>::ReadCurrentBlockOntoQueue():
PreparedDeltaBlock and its decoder are no longer allocated on the heap.
It helps to reduce contention in tcmalloc for high intensity workloads.

Also, switched from std::vector to std::deque for
DeltaPreparer::prepared_deltas_ to accommodate existing patterns
of adding prepared deltas: the number of deltas might be huge,
but the exact number isn't known in advance at the upper level.
That helps to avoid reallocating huge chunks of memory when there
are many deltas to process.

I tested this patch against a scenario having a few rowsets with very
high number of deltas (about 10M), looking at how CompactRowSetsOp()
performed as reported upon completion of the compaction operation:
  Before:
    Timing: real 131.803s     user 83.797s    sys 41.467s
  After:
    Timing: real 121.764s     user 85.979s    sys 32.401s

I also monitored the total amount of memory allocated by capturing
tcmalloc's stats at the embedded webserver's /memz page every second.
The new code allocated a bit less memory at the peak (~1.3%):
  Before:
    MALLOC:    28882367400 (27544.4 MiB) Bytes in use by application
  After:
    MALLOC:    28492412984 (27172.5 MiB) Bytes in use by application

Change-Id: Ia5edd08ab060074d123d1d05ec4b656be3bfc3c8
Reviewed-on: http://gerrit.cloudera.org:8080/19277
Tested-by: Alexey Serbin <alexey@apache.org>
Reviewed-by: Yifan Zhang <chinazhangyifan@163.com>
5 files changed