Uploaded image for project: 'Distributed Tracing'
  1. Distributed Tracing
  2. TRACING-5622

Certificate expiry errors in tempo pods even though the certificates are renewed

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Undefined Undefined
    • None
    • rhosdt-3.6
    • Tempo
    • None
    • Security & Compliance
    • False
    • Hide

      None

      Show
      None
    • False
    • Tracing Sprint # 279
    • Moderate

      Description of problem:

      The certificates of the tempo components were supposed to be expired on 2025-08-29 (retrieved from previously reported case):
      =================
      $ for i in $(oc get secret | grep tls | awk '{print $1}'); do echo $i; oc get secret $i -o yaml | grep after ; done
      tempo-tracing-compactor-mtls
          tempo.grafana.com/certificate-not-after: "2025-08-29T12:35:31Z"
      tempo-tracing-distributor-mtls
          tempo.grafana.com/certificate-not-after: "2025-08-29T12:35:31Z"
      tempo-tracing-gateway-mtls
          tempo.grafana.com/certificate-not-after: "2025-08-29T12:35:31Z"
      tempo-tracing-ingester-mtls
          tempo.grafana.com/certificate-not-after: "2025-08-29T12:35:32Z"
      tempo-tracing-querier-mtls
          tempo.grafana.com/certificate-not-after: "2025-08-29T12:35:30Z"
      tempo-tracing-query-frontend-mtls
          tempo.grafana.com/certificate-not-after: "2025-08-29T12:35:31Z"
      tempo-tracing-signing-ca
          tempo.grafana.com/certificate-not-after: "2030-01-07T17:51:40Z"
      ============
      
      In new case, I see the certificates got rotated as per above time but the start date was few days back (I have double checked the cluster and node name). See below:
      ============
       $ for i in $(oc get secret | grep tls | awk '{print $1}'); do echo $i; oc get secret $i -o yaml | grep not-before ; done
      tempo-tracing-compactor-mtls
          tempo.grafana.com/certificate-not-before: "2025-08-11T12:52:32Z"
      tempo-tracing-distributor-mtls
          tempo.grafana.com/certificate-not-before: "2025-08-11T12:52:32Z"
      tempo-tracing-gateway-mtls
          tempo.grafana.com/certificate-not-before: "2025-08-11T12:52:32Z"
      tempo-tracing-ingester-mtls
          tempo.grafana.com/certificate-not-before: "2025-08-11T12:52:32Z"
      tempo-tracing-querier-mtls
          tempo.grafana.com/certificate-not-before: "2025-08-11T12:52:32Z"
      tempo-tracing-query-frontend-mtls
          tempo.grafana.com/certificate-not-before: "2025-08-11T12:52:32Z"
      tempo-tracing-signing-ca
          tempo.grafana.com/certificate-not-before: "2025-01-07T11:51:39Z"
      ============
      
      Even after the certificates were rotated the query-frontend and gateway pods were streaming errors for expired certificate until the pods were restarted manually.
      
      

      Version-Release number of selected component (if applicable):

      tempo-operator.v0.16.0-2

      How reproducible:

      100%

      Steps to Reproduce:

      This requires waiting for certificate to expire, so needs to be tested in unmanaged config or let tempo deploy certificates with shorter duration.

      Actual results:

      After the certificates of tempo components are rotated, the pods still stream error for "expired certificates"

      Expected results:

      Once the certificates are renewed automatically by operator, then the pods must use the updated certificates.

      Additional info:

      To workaround the problem, restart the tempo pods:
      $ oc delete pods -l ,app.kubernetes.io/managed-by=tempo-operator -n <namespace>

              Unassigned Unassigned
              rhn-support-dgautam Dhruv Gautam
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated: