-
Bug
-
Resolution: Unresolved
-
Major
-
None
-
4.14.z, 4.15.z, 4.17.z, 4.16.z, 4.18.z
-
None
-
None
-
False
-
Description of problem:
When kubernetes.io/kube-apiserver-client-kubelet CSRs are not approved, they are recreated with exponential backoff, causing the old CSRs to be abandoned and left in a pending state. This usually doesn't cause an issue because CSR approvals are typically quick. However, during large machine scale-ups, adding IP addresses to the machine object (which is required for node-to-machine mapping for CSR validation) can be slow, leading to longer CSR approval times. Eventually, the number of pending CSRs may reach the maximum allowed, halting CSR approvals.
Machine approver leaves invalid csrs in pending state. This behaviour is there to not reject valid csrs that should be approved by another controller and to allow manual approval by system admin.
Version-Release number of selected component (if applicable):
I expect this to be the issue across all versions
How reproducible:
Likely when scaling over 500 machines.
Steps to Reproduce:
https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/openshift_release/58057/rehearse-58057-pull-ci-openshift-ovn-kubernetes-release-4.16-ovncore-perfscale-aws-ovn-xlarge-cluster-density-v2/1865957686453997568
Actual results:
Machine approver logs: sr-6mxzf: failed to find machine for node ip-10-0-5-76.ec2.internal, cannot approve after limit is reached: csr-zt477: Pending CSRs: 1074; Max pending allowed: 856. Difference between pending CSRs and machines > 100. Ignoring all CSRs as too many recent pending CSR
Expected results:
Cluster machine approver should handle scaling to 750 nodes at once.
Additional info:
Example machine timeline: machine creationTimestamp: "2024-12-09T04:57:29Z" last attempt to reconcile one of its csrs before CSR limit is reached: 2024-12-09T05:21:39.211911225Z Adresses added to machine: "2024-12-09T05:29:44Z"
for file in *.yaml; do ts=$(grep "creationTimestamp" "$file" | awk '{print $2}'); cn=$(grep "request:" "$file" | awk '{print $2}' | base64 -d | openssl req -noout -text -in - | grep CN= || echo "No CN found"); echo "$cn $ts"; done | sort Subject: O=system:nodes, CN=system:node:ip-10-0-99-70.ec2.internal "2024-12-09T05:23:48Z" Subject: O=system:nodes, CN=system:node:ip-10-0-99-70.ec2.internal "2024-12-09T05:23:58Z" Subject: O=system:nodes, CN=system:node:ip-10-0-99-70.ec2.internal "2024-12-09T05:39:01Z" Subject: O=system:nodes, CN=system:node:ip-10-0-99-70.ec2.internal "2024-12-09T05:54:05Z" Subject: O=system:nodes, CN=system:node:ip-10-0-99-70.ec2.internal "2024-12-09T06:09:13Z" Subject: O=system:nodes, CN=system:node:ip-10-0-99-70.ec2.internal "2024-12-09T06:24:30Z" Subject: O=system:nodes, CN=system:node:ip-10-0-99-70.ec2.internal "2024-12-09T06:40:02Z" Subject: O=system:nodes, CN=system:node:ip-10-0-99-70.ec2.internal "2024-12-09T06:56:02Z" Subject: O=system:nodes, CN=system:node:ip-10-0-99-70.ec2.internal "2024-12-09T07:11:30Z" Subject: O=system:nodes, CN=system:node:ip-10-0-99-70.ec2.internal "2024-12-09T07:26:58Z"