-
Bug
-
Resolution: Done-Errata
-
Minor
-
None
-
4.13
-
Critical
-
No
-
CFE Sprint 234, CFE Sprint 236, CFE Sprint 244
-
3
-
Rejected
-
False
-
Description of problem:
cert-manager does not work with "Managed Identity Using AAD Pod Identities", i.e. https://cert-manager.io/docs/configuration/acme/dns01/azuredns/#managed-identity-using-aad-pod-identities .
Version-Release number of selected component (if applicable):
cert-manager installed with cert-manager-operator-bundle-container-v1.10.2-18 on ipi-on-azure OCP env of payload 4.13.0-0.nightly-2023-03-04-092801
How reproducible:
Always
Steps to Reproduce:
1. Launch ipi-on-azure OCP env. Install cert-manager operator of bundle v1.10.2-18 on it. 2. Follow https://cert-manager.io/docs/configuration/acme/dns01/azuredns/#managed-identity-using-aad-pod-identities . Below are step-by-step details:
Below steps are based on the "Example creation using azure-cli and jq" section in https://cert-manager.io/docs/configuration/acme/dns01/azuredns/#managed-identity-using-aad-pod-identities
2.1 Choose a unique Identity name and existing resource group to create identity in. $ IDENTITY_GROUP=xxia<snipped>-rg $ az group create -l westus -n $IDENTITY_GROUP $ IDENTITY_NAME=xxia<snipped>-test $ az identity create --name $IDENTITY_NAME --resource-group $IDENTITY_GROUP --output json > output/az-identity-create--name--resource-group.json $ IDENTITY="$(cat output/az-identity-create--name--resource-group.json)" Gets principalId to use for role assignment $ PRINCIPAL_ID=$(echo $IDENTITY | jq -r '.principalId') Used for identity binding: $ CLIENT_ID=$(echo $IDENTITY | jq -r '.clientId') $ RESOURCE_ID=$(echo $IDENTITY | jq -r '.id') $ ZONE_NAME=qe1.azure.devcluster.openshift.com $ ZONE_GROUP=<snipped> Get existing DNS Zone Id $ ZONE_ID=$(az network dns zone show --name $ZONE_NAME --resource-group $ZONE_GROUP --query "id" -o tsv) Create role assignment $ az role assignment create --role "DNS Zone Contributor" --assignee $PRINCIPAL_ID --scope $ZONE_ID > output/az-role-assignment-create--assignee_for-pod-aad-test.json
Next we need to ensure we have installed AAD Pod Identity. This will install the CRDs and deployment required to assign the identity. Per https://azure.github.io/aad-pod-identity/docs/configure/deploy_in_openshift/ , the relied /etc/kubernetes/azure.json doesn’t exist in OCP cluster, the AAD Pod Identity will need to be deployed with a managed identity to provide access to Azure, the document is https://azure.github.io/aad-pod-identity/docs/configure/pod_identity_in_managed_mode , below are the steps:
2.2 $ wget https://raw.githubusercontent.com/Azure/aad-pod-identity/master/deploy/infra/managed-mode-deployment.yaml The document says "This installs NMI in managed mode in the kube-system namespace". To ensure success in OCP, do below first: $ oc label ns/kube-system pod-security.kubernetes.io/enforce=privileged pod-security.kubernetes.io/audit=privileged pod-security.kubernetes.io/warn=privileged --overwrite The document also says command "to assign the identity to the VM" (https://azure.github.io/aad-pod-identity/docs/getting-started/role-assignment/#user-assigned-managed-identities-for-self-managed-clusters says same). Do it: $ for i in xxia-05az-f55w4-worker-westus-zdkk6 xxia-05az-f55w4-worker-westus-pblrl xxia-05az-f55w4-worker-westus-bxbvk xxia-05az-f55w4-master-0 xxia-05az-f55w4-master-1 xxia-05az-f55w4-master-2 do az vm identity assign -g xxia-05az-f55w4-rg -n $i --identities "/subscriptions/snipped-subscription-id/resourcegroups/xxia-snipped-rg/providers/Microsoft.ManagedIdentity/userAssignedIdentities/xxia-snipped-test" done $ oc create -f managed-mode-deployment.yaml serviceaccount/aad-pod-id-nmi-service-account created customresourcedefinition.apiextensions.k8s.io/azureidentities.aadpodidentity.k8s.io created customresourcedefinition.apiextensions.k8s.io/azureidentitybindings.aadpodidentity.k8s.io created customresourcedefinition.apiextensions.k8s.io/azurepodidentityexceptions.aadpodidentity.k8s.io created clusterrole.rbac.authorization.k8s.io/aad-pod-id-nmi-role created clusterrolebinding.rbac.authorization.k8s.io/aad-pod-id-nmi-binding created daemonset.apps/nmi created $ oc adm policy add-scc-to-user privileged -z aad-pod-id-nmi-service-account -n kube-system $ oc get po -n kube-system NAME READY STATUS RESTARTS AGE nmi-76rnh 1/1 Running 0 15s nmi-dvw2j 1/1 Running 0 15s nmi-hkzll 1/1 Running 0 15s nmi-n7xc6 1/1 Running 0 15s nmi-p9v9x 1/1 Running 0 15s nmi-ps2hh 1/1 Running 0 15s
Now we can create the identity resource and binding using the below manifest (copied from https://cert-manager.io/docs/configuration/acme/dns01/azuredns/#managed-identity-using-aad-pod-identities)
2.3 $ cat azureidentity-resources.yaml apiVersion: "aadpodidentity.k8s.io/v1" kind: AzureIdentity metadata: annotations: aadpodidentity.k8s.io/Behavior: namespaced name: certman-identity namespace: cert-manager spec: type: 0 resourceID: snipped # Resource Id From previous step clientID: snipped # Client Id from previous step --- apiVersion: "aadpodidentity.k8s.io/v1" kind: AzureIdentityBinding metadata: name: certman-id-binding namespace: cert-manager spec: azureIdentity: certman-identity selector: certman-label # This is the label that needs to be set on cert-manager pods $ oc create -f azureidentity-resources.yaml
Next we need to ensure the cert-manager pod has a relevant label to use the pod identity binding. This can be done by editing the deployment and adding the below into the .spec.template.metadata.labels field (said in https://cert-manager.io/docs/configuration/acme/dns01/azuredns/#managed-identity-using-aad-pod-identities)
2.4 $ oc edit deployment cert-manager -n cert-manager spec: template: metadata: labels: aadpodidbinding: certman-label # must match previous step's "selector"
Note, because cert-manager is managed by cert-manager operator, we can't edit the deployment otherwise it will be automatically reverted. So, first run oc edit certmanager cluster, change managementState to "Unmanaged", then do the editing. This is reported in a separate https://issues.redhat.com/browse/OCPBUGS-8466
2.5 Wait for cert-manager pod is renewed $ oc get po -n cert-manager NAME READY STATUS RESTARTS AGE cert-manager-697967c4b7-5kslq 1/1 Running 1 2m ...
Create clusterissuer (based on https://cert-manager.io/docs/configuration/acme/dns01/azuredns/#managed-identity-using-aad-pod-identities example)
2.6 $ cat clusterissuer-acme-dns01-azuredns-aad-pod-identity.yaml apiVersion: cert-manager.io/v1 kind: ClusterIssuer metadata: name: use-aad-pod-identity spec: acme: preferredChain: "" privateKeySecretRef: name: letsencrypt server: https://acme-staging-v02.api.letsencrypt.org/directory solvers: - dns01: azureDNS: subscriptionID: snipped resourceGroupName: snipped hostedZoneName: qe1.azure.devcluster.openshift.com environment: AzurePublicCloud $ oc create -f clusterissuer-acme-dns01-azuredns-aad-pod-identity.yaml $ oc get clusterissuer -o wide NAME READY STATUS AGE use-aad-pod-identity True The ACME account was registered with the ACME server 3m
Create certificate
$ oc login -u snipped -p snipped $ oc new-project xxia-proj-3 $ cat cert-from-issuer-with-aad-pod-identity.yaml apiVersion: cert-manager.io/v1 kind: Certificate metadata: name: cert4-from-issuer-with-aad-pod-identity spec: secretName: cert4-from-issuer-with-aad-pod-identity issuerRef: kind: ClusterIssuer name: use-aad-pod-identity dnsNames: - xxia-test-4.qe1.azure.devcluster.openshift.com - '*.xxia-test-4.qe1.azure.devcluster.openshift.com' $ oc create -f cert-from-issuer-with-aad-pod-identity.yaml
Checke the certificate
4. $ oc get cert NAME READY SECRET AGE cert4-from-issuer-with-aad-pod-identity False cert4-from-issuer-with-aad-pod-identity 51m $ oc get challenge NAME STATE DOMAIN AGE cert4-from-issuer-with-aad-pod-identity-64487-725540-3403810812 xxia-test-4.qe1.azure.devcluster.openshift.com 52m cert4-from-issuer-with-aad-pod-identity-64487-725540-4176589525 pending xxia-test-4.qe1.azure.devcluster.openshift.com 52m $ oc get order -o wide NAME STATE ISSUER REASON AGE cert4-from-issuer-with-aad-pod-identity-64487-7255405 pending use-aad-pod-identity 52m $ oc get challenge cert4-from-issuer-with-aad-pod-identity-64487-725540-4176589525 -o yaml ... status: presented: false processing: true reason: 'azure.BearerAuthorizer#WithAuthorization: Failed to refresh the Token for request to https://management.azure.com/subscriptions/snipped-subscription-id/resourceGroups/snipped-dns-zone-resource-group/providers/Microsoft.Network/dnsZones/qe1.azure.devcluster.openshift.com/TXT/_acme-challenge.xxia-test-4?api-version=2017-10-01: StatusCode=0 -- Original Error: adal: Failed to execute the refresh request. Error = ''Get "http://169.254.169.254/metadata/identity/oauth2/token?api-version=2018-02-01&mi_res_id=%2Fsubscriptions%2Fsnipped-subscription-id%2Fresourcegroups%2Fxxia-snipped-rg%2Fproviders%2FMicrosoft.ManagedIdentity%2FuserAssignedIdentities%2Fxxia-snipped-test&resource=https%3A%2F%2Fmanagement.core.windows.net%2F": dial tcp 169.254.169.254:80: connect: connection refused''' state: pending
Check logs
5. nmi pods in-previous-step don't show error in logs $ oc logs --timestamps nmi-pod-name-in-previous-step -n kube-system --context admin 2023-03-08T03:04:56.849421039Z I0308 03:04:56.849330 1 main.go:90] starting nmi process. Version: v1.8.14. Build date: 2022-12-13-18:34. 2023-03-08T03:04:56.849421039Z I0308 03:04:56.849385 1 main.go:103] features for scale clusters enabled 2023-03-08T03:04:57.152738814Z I0308 03:04:57.152669 1 crd.go:448] CRD lite informers started 2023-03-08T03:04:57.253938539Z I0308 03:04:57.253810 1 main.go:117] running NMI in namespaced mode: true 2023-03-08T03:04:57.253938539Z I0308 03:04:57.253856 1 nmi.go:53] initializing in managed mode 2023-03-08T03:04:57.253938539Z I0308 03:04:57.253867 1 probes.go:41] initialized health probe on port 8085 2023-03-08T03:04:57.253938539Z I0308 03:04:57.253878 1 probes.go:44] started health probe 2023-03-08T03:04:57.254020639Z I0308 03:04:57.253986 1 metrics.go:341] registered views for metric 2023-03-08T03:04:57.254170339Z I0308 03:04:57.254128 1 prometheus_exporter.go:21] starting Prometheus exporter 2023-03-08T03:04:57.254170339Z I0308 03:04:57.254156 1 metrics.go:347] registered and exported metrics on port 9090 2023-03-08T03:04:57.254425339Z I0308 03:04:57.254314 1 server.go:127] listening on 127.0.0.1:2579 2023-03-08T03:04:57.358128765Z W0308 03:04:57.358081 1 iptables.go:123] flushing iptables to add aad-metadata custom chains cert-manager pods have below logs $ oc logs --timestamps cert-manager-697967c4b7-5kslq --context admin -n cert-manager ... 2023-03-08T04:37:55.066624559Z E0308 04:37:55.066544 1 azuredns.go:157] cert-manager/azure-dns "msg"="Error creating TXT:" "error"="azure.BearerAuthorizer#WithAuthorization: Failed to refresh the Token for request to https://management.azure.com/subscriptions/snipped-subscription-id/resourceGroups/snipped-dns-zone-resource-group/providers/Microsoft.Network/dnsZones/qe1.azure.devcluster.openshift.com/TXT/_acme-challenge.xxia-test-4?api-version=2017-10-01: StatusCode=0 -- Original Error: adal: Failed to execute the refresh request. Error = 'Get \"http://169.254.169.254/metadata/identity/oauth2/token?api-version=2018-02-01&mi_res_id=%2Fsubscriptions%2Fsnipped-subscription-id%2Fresourcegroups%2Fxxia-snipped-rg%2Fproviders%2FMicrosoft.ManagedIdentity%2FuserAssignedIdentities%2Fxxia-snipped-test&resource=https%3A%2F%2Fmanagement.core.windows.net%2F\": dial tcp 169.254.169.254:80: connect: connection refused'" "qe1.azure.devcluster.openshift.com"="(MISSING)" 2023-03-08T04:37:55.067033060Z E0308 04:37:55.066992 1 controller.go:167] cert-manager/challenges "msg"="re-queuing item due to error processing" "error"="azure.BearerAuthorizer#WithAuthorization: Failed to refresh the Token for request to https://management.azure.com/subscriptions/snipped-subscription-id/resourceGroups/snipped-dns-zone-resource-group/providers/Microsoft.Network/dnsZones/qe1.azure.devcluster.openshift.com/TXT/_acme-challenge.xxia-test-4?api-version=2017-10-01: StatusCode=0 -- Original Error: adal: Failed to execute the refresh request. Error = 'Get \"http://169.254.169.254/metadata/identity/oauth2/token?api-version=2018-02-01&mi_res_id=%2Fsubscriptions%2Fsnipped-subscription-id%2Fresourcegroups%2Fxxia-snipped-rg%2Fproviders%2FMicrosoft.ManagedIdentity%2FuserAssignedIdentities%2Fxxia-snipped-test&resource=https%3A%2F%2Fmanagement.core.windows.net%2F\": dial tcp 169.254.169.254:80: connect: connection refused'" "key"="xxia-proj-3/cert4-from-issuer-with-aad-pod-identity-64487-725540-4176589525" 2023-03-08T04:37:55.067163860Z I0308 04:37:55.067108 1 azuredns.go:87] cert-manager "msg"="No ClientID found: authenticating azuredns with managed identity (MSI)" ...
Actual results:
cert-manager does not work with "Managed Identity Using AAD Pod Identities", as detailed in above steps.
Expected results:
cert-manager should work with "Managed Identity Using AAD Pod Identities", as detailed in above steps.
Additional info:
Researched a bit to debug, e.g. found https://github.com/cert-manager/cert-manager/issues/3148 and then tried to add managed identity in clusterissuer as below:
... solvers: - dns01: azureDNS: environment: AzurePublicCloud hostedZoneName: qe1.azure.devcluster.openshift.com managedIdentity: resourceID: snipped-RESOURCE_ID-in-previous-step resourceGroupName: snipped subscriptionID: snipped
This does not address the problem. The problem is still reproduced.
- is blocked by
-
CCO-187 Azure Managed Identity (Workload Identity) Support
- Closed
-
CCO-232 Implement ccoctl command to create infrastructure required for Azure workload identity
- Closed
- links to
-
RHEA-2023:120768 [OpenShift v4.13-v4.14] cert-manager Operator for Red Hat OpenShift 1.13.0
- mentioned on