-
Story
-
Resolution: Unresolved
-
Undefined
-
None
-
None
-
None
Currently the machine-config daemon uses a Node informer that receives updates about all the nodes in the cluster, even though it only actually needs to know about updates to the Node that is hosting that particular MCD. This ticket is about investigating the MCD informer configuration, to determine if the MCD can be adjusted to filter, ideally server-side, that informer, to reduce network traffic, API-server load, and MCD-processing load, in clusters with many nodes.
An alternative, if server-side informer filtering is not possible, would be to adjust the MCD to poll its Node with Gets instead of using informers. But before accepting the latency increase that would entail, we should profile to understand the actual CPU/memory/network savings that would deliver to the Kube API server in clusters with ~50 nodes or so.
Example showing the Nodes watch today, in 4.14.0-ec.4 CI aws-ovn-serial:
$ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.14-e2e-aws-ovn-serial/1685027676433158144/artifacts/e2e-aws-ovn-serial/gather-audit-logs/artifacts/audit-logs.tar | tar -xz --strip-components=2 $ zgrep -h '"verb":"watch".*"resource":"nodes"' kube-apiserver/*.log.gz | jq -r .user.username | sort | uniq -c | grep -1 machine-config-daemon 246 system:serviceaccount:openshift-machine-config-operator:machine-config-daemon 234 system:serviceaccount:openshift-ovn-kubernetes:ovn-kubernetes-node $ zgrep -h '"verb":"watch".*"username":"system:serviceaccount:openshift-machine-config-operator:machine-config-daemon".*"resource":"nodes"' kube-apiserver/*.log.gz | jq -r '.stageTimestamp + " " + (.responseStatus.code | tostring) + " " + .stage + " " + (.user.extra | tostring)' | sort ... 2023-07-28T21:35:49.324318Z 200 ResponseComplete {"authentication.kubernetes.io/pod-name":["machine-config-daemon-2mp9p"],"authentication.kubernetes.io/pod-uid":["eb49a2ec-a49a-499e-ae96-64c9ea790741"]} 2023-07-28T21:35:49.328883Z 200 ResponseStarted {"authentication.kubernetes.io/pod-name":["machine-config-daemon-2mp9p"],"authentication.kubernetes.io/pod-uid":["eb49a2ec-a49a-499e-ae96-64c9ea790741"]} ...
Here is the common code setting up the informer, and if we wanted to use NewFilteredSharedInformerFactory there, we'd need some kind of conditional to distinguish between the "consuming controller needs to watch all the nodes" (e.g. the MachineConfigPool controller) or "only needs to watch node $NAME" (machine-config daemon on that node).