Loading...

XML

Word

Printable

Type: Bug
Resolution: Unresolved
Priority: Normal
Fix Version/s: None
Affects Version/s: 4.17
Component/s: Machine Config Operator
Labels:
- mco-triaged

Severity:
Moderate
Regression:
None
Blocked:
False
Blocked Reason:

Hide

None

Show
None

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Description of problem:

In SNO clusters, when the node is rebooted the MCO controller cannot get the lease and it needs a few minutes before it starts to work properly.

It can cause some weird behaviours like:
- If we reboot the node and we create a MC, no new MC will be rendered for several minutes (until MCC takes the leader)
- If we apply a MC that reboots the node, even if we can see in the MCD that the configuration was fully apply we still have to way a few minutes until MCP reports the configuration to be applied.

How reproducible:

Always

Steps to Reproduce:

    1. Reboot the SNO node

Actual results:

After the reboot we can see that the MCO controller cannot take the lease

$ oc logs machine-config-controller-6c48c4f6f-kp994
Defaulted container "machine-config-controller" out of: machine-config-controller, kube-rbac-proxy
I0724 10:00:03.242019       1 start.go:61] Version: v4.17.0-202407232214.p0.g3acda20.assembly.stream.el9-dirty (3acda20374986a357b25ef996d504914dcb0ebda)
I0724 10:00:03.242404       1 leaderelection.go:121] The leader election gives 4 retries and allows for 30s of clock skew. The kube-apiserver downtime tolerance is 78s. Worst non-graceful lease acquisition is 2m43s. Worst graceful lease acquisition is {26s}.
I0724 10:00:03.532701       1 leaderelection.go:250] attempting to acquire leader lease openshift-machine-config-operator/machine-config-controller...

Expected results:

The MCO controller should be able to take the lease without problems.

Additional info:


This is the MCO controller logs while we reboot the node

E0724 09:55:34.567730       1 reflector.go:150] github.com/openshift/client-go/machineconfiguration/informers/externalversions/factory.go:125: Failed to watch *v1alpha1.MachineOSBuild: failed to list *v1alpha1.MachineOSBuild: the server could not find the requested resource (get machineosbuilds.machineconfiguration.openshift.io)
I0724 09:56:01.666231       1 helpers.go:93] Shutting down due to: terminated
I0724 09:56:01.666347       1 helpers.go:96] Context cancelled
I0724 09:56:01.667056       1 kubelet_config_controller.go:209] Shutting down MachineConfigController-KubeletConfigController
I0724 09:56:01.667139       1 container_runtime_config_controller.go:251] Shutting down MachineConfigController-ContainerRuntimeConfigController
I0724 09:56:01.667174       1 node_controller.go:255] Shutting down MachineConfigController-NodeController
I0724 09:56:01.667208       1 template_controller.go:235] Shutting down MachineConfigController-TemplateController
I0724 09:56:01.667235       1 render_controller.go:135] Shutting down MachineConfigController-RenderController
I0724 09:56:01.667258       1 drain_controller.go:176] Shutting down MachineConfigController-DrainController
I0724 09:56:01.667525       1 metrics.go:123] Metrics listener successfully stopped
I0724 09:56:01.667572       1 simple_featuregate_reader.go:177] Shutting down feature-gate-detector

Assignee:: Team MCO

Reporter:: Sergio Regidor de la Rosa

QA Contact:: Sergio Regidor de la Rosa

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Created:: 2024/07/24 10:14 AM

Updated:: 2024/07/29 6:27 PM

Details

Description

Attachments

Activity

People

Dates