-
Bug
-
Resolution: Unresolved
-
Normal
-
None
-
4.17
-
Moderate
-
None
-
False
-
Description of problem:
In SNO clusters, when the node is rebooted the MCO controller cannot get the lease and it needs a few minutes before it starts to work properly. It can cause some weird behaviours like: - If we reboot the node and we create a MC, no new MC will be rendered for several minutes (until MCC takes the leader) - If we apply a MC that reboots the node, even if we can see in the MCD that the configuration was fully apply we still have to way a few minutes until MCP reports the configuration to be applied.
How reproducible:
Always
Steps to Reproduce:
1. Reboot the SNO node
Actual results:
After the reboot we can see that the MCO controller cannot take the lease $ oc logs machine-config-controller-6c48c4f6f-kp994 Defaulted container "machine-config-controller" out of: machine-config-controller, kube-rbac-proxy I0724 10:00:03.242019 1 start.go:61] Version: v4.17.0-202407232214.p0.g3acda20.assembly.stream.el9-dirty (3acda20374986a357b25ef996d504914dcb0ebda) I0724 10:00:03.242404 1 leaderelection.go:121] The leader election gives 4 retries and allows for 30s of clock skew. The kube-apiserver downtime tolerance is 78s. Worst non-graceful lease acquisition is 2m43s. Worst graceful lease acquisition is {26s}. I0724 10:00:03.532701 1 leaderelection.go:250] attempting to acquire leader lease openshift-machine-config-operator/machine-config-controller...
Expected results:
The MCO controller should be able to take the lease without problems.
Additional info:
This is the MCO controller logs while we reboot the node E0724 09:55:34.567730 1 reflector.go:150] github.com/openshift/client-go/machineconfiguration/informers/externalversions/factory.go:125: Failed to watch *v1alpha1.MachineOSBuild: failed to list *v1alpha1.MachineOSBuild: the server could not find the requested resource (get machineosbuilds.machineconfiguration.openshift.io) I0724 09:56:01.666231 1 helpers.go:93] Shutting down due to: terminated I0724 09:56:01.666347 1 helpers.go:96] Context cancelled I0724 09:56:01.667056 1 kubelet_config_controller.go:209] Shutting down MachineConfigController-KubeletConfigController I0724 09:56:01.667139 1 container_runtime_config_controller.go:251] Shutting down MachineConfigController-ContainerRuntimeConfigController I0724 09:56:01.667174 1 node_controller.go:255] Shutting down MachineConfigController-NodeController I0724 09:56:01.667208 1 template_controller.go:235] Shutting down MachineConfigController-TemplateController I0724 09:56:01.667235 1 render_controller.go:135] Shutting down MachineConfigController-RenderController I0724 09:56:01.667258 1 drain_controller.go:176] Shutting down MachineConfigController-DrainController I0724 09:56:01.667525 1 metrics.go:123] Metrics listener successfully stopped I0724 09:56:01.667572 1 simple_featuregate_reader.go:177] Shutting down feature-gate-detector