Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-727

[4.11] Kubelet cannot be started on worker nodes after upgrade to OCP 4.11 (RHCOS 8.6) when custom SELinux policies are applied

    XMLWordPrintable

Details

    • Moderate
    • False
    • Hide

      None

      Show
      None

    Description

      This is a clone of issue OCPBUGS-595. The following is the description of the original issue:

      Description of problem:

      On an OCP cluster v4.10.26, if a custom SELinux policy is applied on nodes (either by an operator or manually by the admin, when the cluster is being upgraded to 4.11.1 and the nodes are performing MCP update pivoting to RHCOS 8.6, hyperkube and kubelet binaries could not be started due to permission error, that is caused by a wrong SELinux context label on these files.

       

      Version-Release number of selected component (if applicable):

      OCP 4.10.26 --> 4.11.1

      RHCOS 410.84.202208030316-0 --> 411.86.202208112011-0

      How reproducible:

      Not sure, happened in 3 known cases so far.

      Steps to Reproduce:
      1. Start with OCP 4.10.26
      2. Apply custom SELinux policy on worker nodes
      3. Upgrade OCP to 4.11.1

      Actual results:

      kubelet on the nodes after MCD update is not starting.

      journal log:

       

      -- Unit kubelet.service has begun starting up.
      Aug 25 08:37:21 cnv-x4lcc-worker-0-w57k9 systemd[83741]: kubelet.service: Failed to execute command: Permission denied
      Aug 25 08:37:21 cnv-x4lcc-worker-0-w57k9 systemd[83741]: kubelet.service: Failed at step EXEC spawning /usr/bin/hyperkube: Permission denied
      -- Subject: Process /usr/bin/hyperkube could not be executed
      -- Defined-By: systemd
      -- Support: https://access.redhat.com/support
      -- 
      -- The process /usr/bin/hyperkube could not be executed and failed.
      -- 
      -- The error number returned by this process is 13.
      Aug 25 08:37:21 cnv-x4lcc-worker-0-w57k9 systemd[1]: kubelet.service: Main process exited, code=exited, status=203/EXEC
      Aug 25 08:37:21 cnv-x4lcc-worker-0-w57k9 systemd[1]: kubelet.service: Failed with result 'exit-code'.
      -- Subject: Unit failed
      -- Defined-By: systemd
      -- Support: https://access.redhat.com/support
      -- 
      -- The unit kubelet.service has entered the 'failed' state with result 'exit-code'.
      Aug 25 08:37:21 cnv-x4lcc-worker-0-w57k9 systemd[1]: Failed to start Kubernetes Kubelet.
      -- Subject: Unit kubelet.service has failed
      -- Defined-By: systemd
      -- Support: https://access.redhat.com/support
      -- 
      -- Unit kubelet.service has failed.
      -- 
      -- The result is failed.
      Aug 25 08:37:21 cnv-x4lcc-worker-0-w57k9 systemd[1]: kubelet.service: Consumed 7ms CPU time
      -- Subject: Resources consumed by unit runtime
      -- Defined-By: systemd
      -- Support: https://access.redhat.com/support
      -- 
      -- The unit kubelet.service completed and consumed the indicated resources.
      Aug 25 08:37:21 cnv-x4lcc-worker-0-w57k9 sudo[83722]: pam_unix(sudo:session): session closed for user root 

      kubelet and hyperkube binaries uses unlabeled_t context label:

       

       

      $ ll --context /usr/bin/hyperkube && ll --context /usr/bin/kubelet
      -rwxr-xr-x. 2 root root system_u:object_r:unlabeled_t:s0 945 Jan  1  1970 /usr/bin/hyperkube
      -rwxr-xr-x. 2 root root system_u:object_r:unlabeled_t:s0 116665968 Jan  1  1970 /usr/bin/kubelet 

       

       

      Expected results:

      The policies should be integrated successfully and kubelet should start automatically without intervention.

      Additional info:

      In this state, if changing the SELinux mode from Enforcing to Permissive (setenforce 0), the kubelet starts successfully and the node becoming Ready and Schedulable.

      After that, the context of the binary changed is to 

      system_u:object_r:kubelet_exec_t:s0

       and SELinux can be put back in Enforcing mode, allowing kubelet to continue run / restart.

      Note:

      It happens when OpenShift Virtualization (CNV) 4.10 is installed on the cluster prior the upgrade to OCP 4.11.
      CNV is adding a custom SELinux policy:
      https://github.com/kubevirt/kubevirt/blob/099be903803556bfe3a85075a7b55f0a711d9ca7/pkg/virt-handler/selinux/labels.go

      Attachments

        Issue Links

          Activity

            People

              walters@redhat.com Colin Walters
              openshift-crt-jira-prow OpenShift Prow Bot
              Michael Nguyen Michael Nguyen
              Votes:
              0 Vote for this issue
              Watchers:
              12 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: