Uploaded image for project: 'AMQ Streams'
  1. AMQ Streams
  2. ENTMQST-2632

notifications and alerting when the user operator managed certificates are close to expiry

      notifications and alerting when the user operator managed certificates are close to expiry

      certs can be renewed using below commands

      Renewing CA certificates manually # cluster CA oc annotate secret ${KAFKA_CLUSTER}-cluster-ca-cert strimzi.io/force-renew=true # clients CA oc annotate secret ${KAFKA_CLUSTER}-clients-ca-cert strimzi.io/force-renew=true

       

      But customer is looking notifications and alerting when the user operator managed certificates are close to expiry

       

            [ENTMQST-2632] notifications and alerting when the user operator managed certificates are close to expiry

            Errata Tool added a comment -

            Since the problem described in this issue should be resolved in a recent advisory, it has been closed.

            For information on the advisory (Moderate: Streams for Apache Kafka 2.8.0 release and security update), and where to find the updated files, follow the link below.

            If the solution does not work for you, open a new bug report.
            https://access.redhat.com/errata/RHSA-2024:9571

            Errata Tool added a comment - Since the problem described in this issue should be resolved in a recent advisory, it has been closed. For information on the advisory (Moderate: Streams for Apache Kafka 2.8.0 release and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2024:9571

            Latest released dashboard and latest examples from main branch works fine from me. Closing this as done.

            Jakub Stejskal added a comment - Latest released dashboard and latest examples from main branch works fine from me. Closing this as done.

            The CA-based metrics are now merged upstream. Including Grafana dashboard and sample alert.

            JAkub Scholz added a comment - The CA-based metrics are now merged upstream. Including Grafana dashboard and sample alert.

            JAkub Scholz added a comment - - edited

            A metric based on CA expiration is being worked on by outside contributor in Strimzi#9861. Given we did not get to implement any per-user expiration metric, we should consider implementation of that as fulfilling this issue?

            JAkub Scholz added a comment - - edited A metric based on CA expiration is being worked on by outside contributor in Strimzi#9861 . Given we did not get to implement any per-user expiration metric, we should consider implementation of that as fulfilling this issue?

            The problem is that the expiration is not a concern of the User Operator. The UO is happy to renew it when the time comes. So the alerts in the UO on the level of user certificates make a little sense. What do you want to alert about? That in 60 days the certificate will be renewed and that in 30 days, it will be 30 days before expiration and the user certificate will be renewed? That does not make sense, it is completely unactionable. The UO ops will just ignore it for 30 days until it is renewed and at that point the alert disappears and they will be done with it.

            The issue is in the Kafka clients which need to update the certificates before they expire. While I'm not convinced that it makes sense to build this into the Kafka client directly, this is a client concern and needs to be handled on the client side and not in the UO because the UO does not know what certiicate is actually used by the application.

            The only thing the operator can do is provide some days until expiration metric which might alert when the renewal failed. But I think that makes sense for the CAs, but not for 100s or 1000s of user certificates.

            JAkub Scholz added a comment - The problem is that the expiration is not a concern of the User Operator. The UO is happy to renew it when the time comes. So the alerts in the UO on the level of user certificates make a little sense. What do you want to alert about? That in 60 days the certificate will be renewed and that in 30 days, it will be 30 days before expiration and the user certificate will be renewed? That does not make sense, it is completely unactionable. The UO ops will just ignore it for 30 days until it is renewed and at that point the alert disappears and they will be done with it. The issue is in the Kafka clients which need to update the certificates before they expire. While I'm not convinced that it makes sense to build this into the Kafka client directly, this is a client concern and needs to be handled on the client side and not in the UO because the UO does not know what certiicate is actually used by the application. The only thing the operator can do is provide some days until expiration metric which might alert when the renewal failed. But I think that makes sense for the CAs, but not for 100s or 1000s of user certificates.

            The broker could provide metrics for the cert(s) in its trust stores (thus the cluster and clients CA certs in Strimzi) and the cert (chain) in it's key store. So not including certs issued to users. That part would need to be in the UO (It could also be added to the AK Java client's metrics, but that's only helpful for Java clients).

            Tom Bentley (Inactive) added a comment - The broker could provide metrics for the cert(s) in its trust stores (thus the cluster and clients CA certs in Strimzi) and the cert (chain) in it's key store. So not including certs issued to users. That part would need to be in the UO (It could also be added to the AK Java client's metrics, but that's only helpful for Java clients).

              jstejska@redhat.com Jakub Stejskal
              rhn-support-kkakarla kodandaRamu kakarla
              Jakub Stejskal Jakub Stejskal
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

                Created:
                Updated:
                Resolved: