Pulsar has a metric that indicates load topic failed: topic_load_failed_total
, it will be increased at the following cases
Adding an additional label of the metric topic_load_failed_total
may let us know what error happened fastly, so we can fix the issue fastly.
Add a label named reason for topic_load_failed_total
reason
bundle_unloading
failed_load_policies
failed_load_ml
failed_access_metadata_store
failed_init
timeout
others
reason = bundle_unloading
increases a moment, and it stop to increase after a while, it means everything is fine.reason = timeout
increases a moment, and it stops to increase after a while, it means too many topics were loaded at the same time, it may be okay.