Uploaded image for project: 'Cert Manager support for Red Hat OpenShift'
  1. Cert Manager support for Red Hat OpenShift
  2. CM-442

[sts Regression] Failed to issue certs with ACME Route53 dns01 solver in AWS STS env when pod-identity-webhook is not used

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Major Major
    • None
    • cert-manager-1.15
    • None
    • OAPE Sprint 264
    • Critical

      Original bug title:

      cert-manager [v1.15 Regression] Failed to issue certs with ACME Route53 dns01 solver in AWS STS env

      ^ this bug is a clone of OCPBUGS-41727 which was triaged to be an issue with CCO and fixed in aws-pod-identity-webhook in OpenShift Cloud Credential Operator, while that solved the issue of regression auth in STS environment using an AWS pod-identity-webhook injection a new bug was found that causes issues on authorization credentials from a creds file.

      Ref: https://redhat-internal.slack.com/archives/C045YMPKR3M/p1733916726181759

      Description of problem:

          When using Route53 as the dns01 solver to create certificates, it fails in both automated and manual tests. For the full log, please refer to the "Actual results" section.

      Version-Release number of selected component (if applicable):

          cert-manager operator v1.15.0 staging build

      How reproducible:

          Always

      Steps to Reproduce: see https://gist.github.com/lunarwhite/c09e3ec450495b7e534decbd80ee8879#file-ocp-65132-sh

      Actual results:

      1. The certificate is not Ready.
      2. The challenge of the cert is stuck in the pending status:
      
      PresentError: Error presenting challenge: failed to change Route 53 record set: operation error Route 53: ChangeResourceRecordSets, get identity: get credentials: failed to refresh cached credentials, failed to retrieve credentials, operation error STS: AssumeRoleWithWebIdentity, failed to resolve service endpoint, endpoint rule error, Invalid Configuration: Missing Region  

      Expected results:

      The certificate should be Ready. The challenge should succeed.

      Additional info:

      If I manually append region = <cluster_region> to the credential generated by ccoctl and apply that secret, the error would be gone and everything returns to normal.

      [default]
      sts_regional_endpoints = regional
      role_arn = arn:aws:iam::<id>:role/<token>
      web_identity_token_file = /var/run/secrets/openshift/serviceaccount/token
      region = <cluster_region>  // <- appended manually 

      This credentials template comes from: https://github.com/openshift/cloud-credential-operator/blob/2395cbc6767c9b0c403e3662afb3ac087cae8075/pkg/aws/actuator/actuator.go#L62

      There s also a similar request: https://issues.redhat.com/browse/CCO-625, https://redhat-internal.slack.com/archives/C04TMSTHUHK/p1733946365383189

              swghosh@redhat.com Swarup Ghosh
              rh-ee-yuewu Yuedong Wu
              Swarup Ghosh
              Yuedong Wu Yuedong Wu
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated: