Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-37501

In SNO clusters MCC cannot take the lease after the node is rebooted

XMLWordPrintable

    • Moderate
    • None
    • False
    • Hide

      None

      Show
      None

      Description of problem:

      In SNO clusters, when the node is rebooted the MCO controller cannot get the lease and it needs a few minutes before it starts to work properly.
      
      It can cause some weird behaviours like:
      - If we reboot the node and we create a MC, no new MC will be rendered for several minutes (until MCC takes the leader)
      - If we apply a MC that reboots the node, even if we can see in the MCD that the configuration was fully apply we still have to way a few minutes until MCP reports the configuration to be applied.
      
          

      How reproducible:

      Always
          

      Steps to Reproduce:

          1. Reboot the SNO node
          

      Actual results:

      After the reboot we can see that the MCO controller cannot take the lease
      
      $ oc logs machine-config-controller-6c48c4f6f-kp994
      Defaulted container "machine-config-controller" out of: machine-config-controller, kube-rbac-proxy
      I0724 10:00:03.242019       1 start.go:61] Version: v4.17.0-202407232214.p0.g3acda20.assembly.stream.el9-dirty (3acda20374986a357b25ef996d504914dcb0ebda)
      I0724 10:00:03.242404       1 leaderelection.go:121] The leader election gives 4 retries and allows for 30s of clock skew. The kube-apiserver downtime tolerance is 78s. Worst non-graceful lease acquisition is 2m43s. Worst graceful lease acquisition is {26s}.
      I0724 10:00:03.532701       1 leaderelection.go:250] attempting to acquire leader lease openshift-machine-config-operator/machine-config-controller...
      
      
          

      Expected results:

      The MCO controller should be able to take the lease without problems.
          

      Additional info:

      
      This is the MCO controller logs while we reboot the node
      
      E0724 09:55:34.567730       1 reflector.go:150] github.com/openshift/client-go/machineconfiguration/informers/externalversions/factory.go:125: Failed to watch *v1alpha1.MachineOSBuild: failed to list *v1alpha1.MachineOSBuild: the server could not find the requested resource (get machineosbuilds.machineconfiguration.openshift.io)
      I0724 09:56:01.666231       1 helpers.go:93] Shutting down due to: terminated
      I0724 09:56:01.666347       1 helpers.go:96] Context cancelled
      I0724 09:56:01.667056       1 kubelet_config_controller.go:209] Shutting down MachineConfigController-KubeletConfigController
      I0724 09:56:01.667139       1 container_runtime_config_controller.go:251] Shutting down MachineConfigController-ContainerRuntimeConfigController
      I0724 09:56:01.667174       1 node_controller.go:255] Shutting down MachineConfigController-NodeController
      I0724 09:56:01.667208       1 template_controller.go:235] Shutting down MachineConfigController-TemplateController
      I0724 09:56:01.667235       1 render_controller.go:135] Shutting down MachineConfigController-RenderController
      I0724 09:56:01.667258       1 drain_controller.go:176] Shutting down MachineConfigController-DrainController
      I0724 09:56:01.667525       1 metrics.go:123] Metrics listener successfully stopped
      I0724 09:56:01.667572       1 simple_featuregate_reader.go:177] Shutting down feature-gate-detector
      
          

            team-mco Team MCO
            sregidor@redhat.com Sergio Regidor de la Rosa
            Sergio Regidor de la Rosa Sergio Regidor de la Rosa
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated: