Uploaded image for project: 'Red Hat Advanced Cluster Management'
  1. Red Hat Advanced Cluster Management
  2. ACM-18914

ACM Not Syncing Ingress Certificates to Hosted Clusters Leading to Expired Certificates

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Critical Critical
    • None
    • None
    • None
    • None
    • False
    • Hide

      None

      Show
      None
    • False
    • Important
    • None

      Description of problem:

      We have observed multiple occurrences where the ACM policy responsible for syncing ingress certificates to data clusters has failed to update the certificate.

      This has resulted in expired certificates causing login issues to OpenShift console instances.

      • OHSS-42049
      • OHSS-42076

      Version-Release number of selected component (if applicable):

      One cluster is 4.14 ~ and another is 4.17
      
      But we think this relate to the ACM of HCP

      How reproducible:

       

      Steps to Reproduce:

      1. Confirm the default ingress controller secret has been expired on the Hosted cluster
       $ ocm backplane elevate "OHSS-42076" -- get secret -n openshift-ingress 2875r5ve1gnl0r7fckl3po6i32rbeuq8-primary-cert-bundle-secret -ojson | jq -r '.data."tls.crt"' | base64 --decode | openssl x509 -enddate -noout
      notAfter=Mar 12 13:49:20 2025 GM

      2. Verify that the Service Cluster holds an updated certificate: 

      $ ocm backplane elevate "OHSS-42076" -- get secret  2875r5ve1gnl0r7fckl3po6i32rbeuq8 -n openshift-acm-policies -ojson | jq -r '.data."tls.crt"' | base64 --decode | openssl x509 -enddate -noout
      notAfter=May 11 12:52:22 2025 GMT 

      3.  We expected the ACM policy will copy the updated certificate secret from Service Cluster to Management Cluster, but this seems not happened for the last 3 months.

      Check related ocm policy status and get the following 

       

      $ oc get policies -n 2875r5ve1gnl0r7fckl3po6i32rbeuq8 openshift-acm-policies.rosa-ingress-certificate-policies -o yaml
      
      ...omitted...
      
        - compliant: Compliant
          history:
          - eventName: openshift-acm-policies.rosa-ingress-certificate-policies.1810882dbb60ccbf
            lastTimestamp: "2024-12-12T20:33:06Z"
            message: Compliant; notification - secrets [2875r5ve1gnl0r7fckl3po6i32rbeuq8-primary-cert-bundle-secret]
              found as specified in namespace openshift-ingress
          - eventName: openshift-acm-policies.rosa-ingress-certificate-policies.18107be499fd9bf0
            lastTimestamp: "2024-12-12T16:47:58Z"
            message: Compliant; notification - secrets [2875r5ve1gnl0r7fckl3po6i32rbeuq8-primary-cert-bundle-secret]
              found as specified in namespace openshift-ingress
          - eventName: openshift-acm-policies.rosa-ingress-certificate-policies.181075570e0fdba2
            lastTimestamp: "2024-12-12T14:47:53Z"
            message: Compliant; notification - secrets [2875r5ve1gnl0r7fckl3po6i32rbeuq8-primary-cert-bundle-secret]
              was updated successfully in namespace openshift-ingress
          - eventName: openshift-acm-policies.rosa-ingress-certificate-policies.181075570d0963de
            lastTimestamp: "2024-12-12T14:47:53Z"
            message: NonCompliant; violation - secrets [2875r5ve1gnl0r7fckl3po6i32rbeuq8-primary-cert-bundle-secret]
              found but not as specified in namespace openshift-ingress
          - eventName: openshift-acm-policies.rosa-ingress-certificate-policies.180752c49877b0ed
            lastTimestamp: "2024-11-12T20:33:06Z"
            message: Compliant; notification - secrets [2875r5ve1gnl0r7fckl3po6i32rbeuq8-primary-cert-bundle-secret]
              found as specified in namespace openshift-ingress
          - eventName: openshift-acm-policies.rosa-ingress-certificate-policies.17fe1d5c187e73c7
            lastTimestamp: "2024-10-13T20:33:08Z"
            message: Compliant; notification - secrets [2875r5ve1gnl0r7fckl3po6i32rbeuq8-primary-cert-bundle-secret]
              found as specified in namespace openshift-ingress
          - eventName: openshift-acm-policies.rosa-ingress-certificate-policies.17fe143013808924
            lastTimestamp: "2024-10-13T17:45:03Z"
            message: Compliant; notification - secrets [2875r5ve1gnl0r7fckl3po6i32rbeuq8-primary-cert-bundle-secret]
              found as specified in namespace openshift-ingress
          - eventName: openshift-acm-policies.rosa-ingress-certificate-policies.17fe0da27b55f555
            lastTimestamp: "2024-10-13T15:44:58Z"
            message: Compliant; notification - secrets [2875r5ve1gnl0r7fckl3po6i32rbeuq8-primary-cert-bundle-secret]
              was updated successfully in namespace openshift-ingress
          - eventName: openshift-acm-policies.rosa-ingress-certificate-policies.17fe0da27a423840
            lastTimestamp: "2024-10-13T15:44:58Z"
            message: NonCompliant; violation - secrets [2875r5ve1gnl0r7fckl3po6i32rbeuq8-primary-cert-bundle-secret]
              found but not as specified in namespace openshift-ingress
          - eventName: openshift-acm-policies.rosa-ingress-certificate-policies.17f4e7f37b7c7b48
            lastTimestamp: "2024-09-13T20:33:10Z"
            message: Compliant; notification - secrets [2875r5ve1gnl0r7fckl3po6i32rbeuq8-primary-cert-bundle-secret]
              found as specified in namespace openshift-ingress
          templateMeta:
            creationTimestamp: null
            name: rosa-ingress-certificate-policies 

      From the events history, it seems sync every month but we see no events after "2024-12-12T20:33:06Z"

       

      Expected results:

      • ACM should automatically push the updated certificate from the ServiceCluster to the Management Cluster, then sync to Hosted Cluster every month as expected.
      • Ingress certificates should not expire

       

      We're worried about there are many other cluster might be affected as well and the ingress certificates did not renewed. We need HCP ACM help to identify the potential bug and provide a solution/ temporary workaround we could apply to avoid severe incidents.

       

      Thank you in advanced

      Additional info:

          Slack: https://redhat-internal.slack.com/archives/C04EUL1DRHC/p1741708441933359

              rokejungrh Roke Jung
              rhn-support-judzhu Jude Zhu
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

                Created:
                Updated: