ccff860b68a69bfbb0d572a557625ff01530f5ef - incubator-sdap-ingester

commit	ccff860b68a69bfbb0d572a557625ff01530f5ef	[log] [tgz]
author	Riley Kuttruff <72955101+RKuttruff@users.noreply.github.com>	Tue Nov 22 11:19:33 2022 -0800
committer	GitHub <noreply@github.com>	Tue Nov 22 11:19:33 2022 -0800
tree	d3c50a5ccaf21b2ffab0fe82005ab822f5aeaa06
parent	813355dae109ae9f70b4da89d6eaba134bc7e496 [diff]

SDAP-408 - Improvements to ingestion (#61) * Writer fault tolerance Noticed with Solr writes, but applied to all writers. Ingester process hits the underlying store very hard which, in Solr's case, can cause the write operation to fail. Existing implementation treats any failure as a lost connection and fails the ENTIRE pipeline. Now it will make several attempts with some backoff between attempts. * Don't use np.ma.filled unless needed Xarray already handles filling invalid points with NaN, so we just need to grab the underlying np.ndarray from the DataArray. The call to np.ma.filled with xr.DataArray type which I suspect data_subset is frequently if not always, is equivalent to calling np.array(data_subset). * Worker init log msg * Write consolidation * Removed use of np.ma.filled with xr.DataArrays. * Elasticsearch writer complies with abstract def but doesn't batch yet * Updated data subset array creation for all reading processors * Batching * Batching of executor tasks & Cassandra writes Cassandra writes are still individual but they are started & awaited in batches * Raised logging level in kelvin to celsius processor to match others * Logging formatting for time * Logging formatting for write progress * Improvements * Removed commented code Co-authored-by: rileykk <rileykk@jpl.nasa.gov> Co-authored-by: skperez <stepheny.k.perez@jpl.nasa.gov>