tree 73592d86d42291a093a91badc23712885eb0341e
parent 6a8e7f39aac100a285a2c190186e38b73a5c9d34
author ming <itestmycode@gmail.com> 1659070833 -0400
committer GitHub <noreply@github.com> 1659070833 -0500
gpgsig -----BEGIN PGP SIGNATURE-----
 
 wsBcBAABCAAQBQJi42lxCRBK7hj4Ov3rIwAAockIAHYMoN+AVj9TvYNl6vk5MpoY
 62ObrPf5skW/3bNl+RYsNP1qqecIZQSElH/eLmyDo0m72Peiv3PgXOeS+dU4INt0
 OMFPgnR0qNfFneSEDG35ujXDP+4eFnQz3FF03vSBsceVrG+HLOAfAAnUMRQz0KbM
 1GDGVJ9gky9MHn6PgjRnz3ZOUrApA6YI5rHBWB2fZNJhE3ItwUPr+dGeuTi4d3GT
 qJX4M8YIaQDIwJ+UInGvPhX1HWVFUcm92rnJ4/u7qxYSDBOMsjjKxhgCkzS0d+EM
 z7jdpTR01bXnhjHkIz/9O3cB26lAxF62ZpSz1Ygo6SmrFmxEG/WCgW9XvSg2sFs=
 =hCXY
 -----END PGP SIGNATURE-----
 

[issue 814] consumer and producer reconnect failure metrics counter (#815)

* consumer and producer reconnect failure metrics counter

* increment on every reconnect failure

* producer consumer reconnect max retry counter

Implement #814 

### Motivation
In a Pulsar cluster's kubernetes deployment or a deployment with Proxy/LB in the front, we need metrics counter to track the re-connection failure producers and consumers.

When brokers go offline but the proxy/LB is still functioning, TCP connection can still be established but the topic look up failed. pulsar_client_connections_establishment_errors counter is not incremented in this case.  Therefore new counters are required to track such failure cases.

### Modifications

Two new counter metrics `pulsar_client_producers_reconnect_failure` and `pulsar_client_consumers_reconnect_failure` will be incremented at the producer_partition and consumer_partition retry failure code block.

Two new counter metrics `pulsar_client_producers_reconnect_max_retry` and `pulsar_client_consumers_reconnect_max_retry` will be incremented at the producer_partition and consumer_partition when either the max retry or max back off is reached.

The existing code logic already covers the case when the topic does not exist. The counters will not be pegged if the topic does not exist. It simply exists from the retry loop at once.

### Verifying this change

This has been verified in the Pulsar cluster deployment with Proxy. We do not have such set up in CI because it's not possible to test with Pulsar standalone mode.

### Does this pull request potentially affect one of the following parts:

*If `yes` was chosen, please highlight the changes*

  - Dependencies (does it add or upgrade a dependency): ( no)
  - The public API: ( no)
  - The schema: (no)
  - The default values of configurations: ( no)
  - The wire protocol: (no)

### Documentation

  - Does this pull request introduce a new feature? (no)
  - If yes, how is the feature documented? (not applicable / docs / GoDocs / not documented)
  - If a feature is not applicable for documentation, explain why?
  - If a feature is not documented yet in this PR, please create a followup issue for adding the documentation
