Merge crossdc-wip into main. (#50)

* WIP for Cross DC consumer  (#5)

* CrossDc Consumer wip

* Temp commit

* wip commit

* Draft commit, cleanup code

* WIP commit

* Fix tests, WIP commit

* Refactor MessageProcessor and some cleanup.

Co-authored-by: Anshum Gupta <anshum@apache.org>

* Support for integration tests with SolrCloud. (#6)

I've had a few little issues that I've been chasing, so to get something in, I pulled out kafka and some other items and setup a minimal set of changes to use SolrCloud in tests and I have a simple test that runs the same messsage processor test but with a real cloud instance instead of a mock. Hang on a moment and I'll get kafka and the rest back in.

* Add back in the EmbeddedKafkaCluster. (#7)

* WIP for UpdateRequestProcessor and other producer related files (#8)

* IntegrationTest WIP (#9)

* Fix a compile error, add solr-core as a dependency as the URP requires it (#10)

* Move cross dc producer code into separate module (#11)

* Move cross dc producer code into a separate module

* Fix package name in javadocs

* Create a commons module for crossdc (#12)

* Integration tests for cross dc consumer with SolrCloud (#14)

* Support for integration tests with SolrCloud.
* Add back in the EmbeddedKafkaCluster.

* Working out remaining issues via manual testing. (#15)

* Tie up

* Fix a variety of issues.

* Work out SolrAndKafkaIntegrationTest (#16)

Each of the 2 tests still needs to be run separately - need to fix up running both in same run.

* Test Passing (#17)

* Producer only needs one of the many runtime dependencies. (#18)

* Experiment with Producer in crossdc package. (#19)

* Simplify build artifacts. (#20)

* Revert "Experiment with Producer in crossdc package. (#19)"

This reverts commit 9007aa6b8d51384ab5f146ebdce013633bae4660.

* Simplify build artifacts and allow for future shading if necessary with commons and producer only creating a single uber jar artifact.

* Allow central config from ZK and build updates. (#21)

* Docs and enable flag. (#22)

* Add explicit error logging around CrossDC config. (#23)

* Work around access limitations (what you can do here seems to vary by Java version) (#24)

* Fix startup checks (#26)

This change corrects the `if` statements in the consumer start, so that it
actually checks the config being passed.

Previously it just checked the `bootstrapServers` three times.

* KafkaCrossDcConf: Add a human readable toString (#27)

This change adds a human readable `toString` function to `KafkaCrossDcConf`.

* Update Solr version, build improvement for some envs, include missing solrj dep (#28)

* Cleanup some of the testing issues. (#25)

* Consumer: Add initial metrics (#29)

This change adds initial support for Dropwizard Metrics to the Consumer.
With this we could allow users to send metrics to a number of reporting
backends.

* Inefficient DBQ (#31)

* Finish setting up the retry queue based on existing implementation. (#30)

* Reindex Test, cover a few more test cases, additional config options. (#32)

* Reindex Test, cover a few more test cases, additional config options.

Adds a basic test focused on reindexing, covers a few more test cases: more replicas and shards, different shard count on primary and secondary dc, allows consumer group to be configured, allows crossdc zk prop file location to be configured.

* Remove TODO, some minor fixes, doc.

* Add maxPolledRecords and some logging cleanup.

* Config improvements and queue bug fixes. (#34)

* Configuration improvements, cleanup, minor fixes. (#35)

* Explicit SSL Config (#36)

* SSL config passed on to Kafka and a variety of general cleanup. (#37)

* Flush producer on close to prevent losing any pending updates. (#38)

* Config override test and cleanup. (#39)

* Flush producer on close to prevent losing any pending updates.

* Add a config override test and some cleanup.

* Cleanup Kafka config class. (#40)

* Improve test to cover a bit more. (#41)

* Beef up config override test, additional config logging. (#42)

* Allow empty string as a config property value. (#43)

* Check docSize against max before mirroring (#45)

* Check docSize against max before mirroring

Prior to this commit, the MirroringUpdateProcessor had no check to
ensure that docs weren't running afoul of the batch size limit set at
the Kafka level.

This commit changes this to ensure that docs exceeding this limit are
not mirrored.  These offending docs may still be indexed, based on the
value of the URP's "indexUnmirrorableDocs" config property (which
defaults to 'false' if not set).

* Fix log message printed when mirroring skipped for large doc (#46)

* Check batch-size against "max" before mirroring (#48)

Prior to this commit, MirroringUpdateProcessor checked that individual
documents didn't exceed the mirroring batch-size limit on their own, but
no check existed to ensure that the entire list of docs doesn't exceed
the limit.

This commit adds that check (in MirroringUpdateProcessor.finish()).  If
the configured max batch-size is exceeded, MUP will log out an error
(which includes the IDs of all documents in the batch).

* Redact properties before string-ifying or logging (#49)

Prior to this commit, certain log messages during Producer/Consumer
startup contained potentially sensitive information, such as SSL
passwords, etc.

This commit amends the string-ification of KafkaCrossDcConf and other
Map/Properties types to "redact" properties whose key/name contains a
block-list of values.  This list currently consists of "password" and
"credentials": any property name containing these substrings (ignoring
case) will be redacted before logging.

* Remove a couple wip testing tools.

Co-authored-by: Anshum Gupta <anshum@apache.org>
Co-authored-by: Patrik Greco <pgreco@apple.com>
Co-authored-by: Jason Gerlowski <gerlowskija@apache.org>
59 files changed
tree: 29a682bcefd904771e62e09e41c1340cbc2a2c45
  1. crossdc-commons/
  2. crossdc-consumer/
  3. crossdc-producer/
  4. gradle/
  5. .asf.yaml
  6. .gitattributes
  7. .gitignore
  8. build.gradle
  9. cluster-stop.sh
  10. cluster.sh
  11. CROSSDC.md
  12. gradlew
  13. gradlew.bat
  14. LICENSE
  15. log4j2.xml
  16. manual-test.sh
  17. README.md
  18. settings.gradle
  19. version.props
README.md

Solr Sandbox

The solr sandbox repository serves as a place to host contributions that are not a part of core solr.

This allows for contributors to try out new ideas without affecting the main project.

The Lucene sandbox, which provides the same value for Lucene centric code, is currently a module within the main lucene-solr repository.