-
Bug
-
Resolution: Unresolved
-
Major
-
4.14, 4.15, 4.16, 4.17, 4.18
This is a clone of issue OCPBUGS-45939. The following is the description of the original issue:
—
This is a clone of issue OCPBUGS-45938. The following is the description of the original issue:
—
This is a clone of issue OCPBUGS-45937. The following is the description of the original issue:
—
This is a clone of issue OCPBUGS-41727. The following is the description of the original issue:
—
Original bug title:
cert-manager [v1.15 Regression] Failed to issue certs with ACME Route53 dns01 solver in AWS STS env
Description of problem:
When using Route53 as the dns01 solver to create certificates, it fails in both automated and manual tests. For the full log, please refer to the "Actual results" section.
Version-Release number of selected component (if applicable):
cert-manager operator v1.15.0 staging build
How reproducible:
Always
Steps to Reproduce: also documented in gist
1. Install the cert-manager operator 1.15.0 2. Follow the doc to auth operator with AWS STS using ccoctl: https://docs.openshift.com/container-platform/4.16/security/cert_manager_operator/cert-manager-authenticate.html#cert-manager-configure-cloud-credentials-aws-sts_cert-manager-authenticate 3. Create a ACME issuer with Route53 dns01 solver 4. Create a cert using the created issuer
OR:
Refer by running `/pj-rehearse pull-ci-openshift-cert-manager-operator-master-e2e-operator-aws-sts` on https://github.com/openshift/release/pull/59568
Actual results:
1. The certificate is not Ready. 2. The challenge of the cert is stuck in the pending status: PresentError: Error presenting challenge: failed to change Route 53 record set: operation error Route 53: ChangeResourceRecordSets, get identity: get credentials: failed to refresh cached credentials, failed to retrieve credentials, operation error STS: AssumeRoleWithWebIdentity, failed to resolve service endpoint, endpoint rule error, Invalid Configuration: Missing Region
Expected results:
The certificate should be Ready. The challenge should succeed.
Additional info:
The only way to get it working again seems to be injecting the "AWS_REGION" environment variable into the controller pod. See upstream discussion/change:
- https://github.com/cert-manager/cert-manager/issues/7102
- https://github.com/cert-manager/cert-manager/pull/7108
- https://cert-manager.io/docs/releases/release-notes/release-notes-1.15/#bug-or-regression-1
I couldn't find a way to inject the env var into our operator-managed operands, so I only verified this workaround using the upstream build v1.15.3. After applying the patch with the following command, the challenge succeeded and the certificate became Ready.
oc patch deployment cert-manager -n cert-manager \ --patch '{"spec": {"template": {"spec": {"containers": [{"name": "cert-manager-controller", "env": [{"name": "AWS_REGION", "value": "aws-global"}]}]}}}}'
- blocks
-
OCPBUGS-46487 aws-sdk-go-v2 fails to authenticate AssumeRoleWithWebIdentity on AWS STS clusters
- Verified
-
OCPBUGS-45941 aws-sdk-go-v2 fails to authenticate AssumeRoleWithWebIdentity on AWS STS clusters
- Closed
- clones
-
OCPBUGS-45939 aws-sdk-go-v2 fails to authenticate AssumeRoleWithWebIdentity on AWS STS clusters
- Verified
- is blocked by
-
OCPBUGS-45939 aws-sdk-go-v2 fails to authenticate AssumeRoleWithWebIdentity on AWS STS clusters
- Verified
- is cloned by
-
OCPBUGS-46487 aws-sdk-go-v2 fails to authenticate AssumeRoleWithWebIdentity on AWS STS clusters
- Verified
-
OCPBUGS-45941 aws-sdk-go-v2 fails to authenticate AssumeRoleWithWebIdentity on AWS STS clusters
- Closed
- links to
-
RHBA-2024:11562 OpenShift Container Platform 4.15.z bug fix update