The indexing
topology is a topology dedicated to taking the data from the enrichment topology that have been enriched and storing the data in one or more supported indices
By default, this topology writes out to both HDFS and one of Elasticsearch and Solr.
If a message is missing the source.type
field, the message tuple will be failed and not written with an appropriate error indicated in the Storm UI and logs.
The indexing topology is extremely simple. Data is ingested into kafka and sent to
/apps/metron/enrichment/indexed
By default, errors during indexing are sent back into the indexing
kafka queue so that they can be indexed and archived.
The indexing
topology as started by the $METRON_HOME/bin/start_elasticsearch_topology.sh
or $METRON_HOME/bin/start_solr_topology.sh
script uses a default of one executor per bolt. In a real production system, this should be customized by modifying the flux file in $METRON_HOME/flux/indexing/remote.yaml
.
parallelism
field to the bolts to give Storm a parallelism hint for the various components. Give bolts which appear to be bottlenecks (e.g. the indexing bolt) a larger hint.parallelism
field to the kafka spout which matches the number of partitions for the enrichment kafka queue.topology.workers
field for the topology.Finally, if workers and executors are new to you or you don't know where to modify the flux file, the following might be of use to you:
There are rest endpoints available to perform operations like start, stop, activate, deactivate on the indexing
topologies.