Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-595

Kubelet cannot be started on worker nodes after upgrade to OCP 4.11 (RHCOS 8.6) when custom SELinux policies are applied

XMLWordPrintable

    • Moderate
    • None
    • Sprint 224 - Team Update&Remot
    • 1
    • False
    • Hide

      None

      Show
      None
    • Hide
      * Previously, custom SELinux policy modules were not properly supported by `rpm-ostree`, so they were not updated along with the rest of the system upon update. This would surface as failures in unrelated components. Pending SELinux userspace improvements landing in a future {product-title} release, this update provides a workaround to {op-system} that will rebuild and reload the SELinux policy during boot as needed. (link:https://issues.redhat.com/browse/OCPBUGS-595[*OCPBUGS-595*])
      Show
      * Previously, custom SELinux policy modules were not properly supported by `rpm-ostree`, so they were not updated along with the rest of the system upon update. This would surface as failures in unrelated components. Pending SELinux userspace improvements landing in a future {product-title} release, this update provides a workaround to {op-system} that will rebuild and reload the SELinux policy during boot as needed. (link: https://issues.redhat.com/browse/OCPBUGS-595 [* OCPBUGS-595 *])
    • Bug Fix
    • Done

      Description of problem:

      On an OCP cluster v4.10.26, if a custom SELinux policy is applied on nodes (either by an operator or manually by the admin, when the cluster is being upgraded to 4.11.1 and the nodes are performing MCP update pivoting to RHCOS 8.6, hyperkube and kubelet binaries could not be started due to permission error, that is caused by a wrong SELinux context label on these files.

       

      Version-Release number of selected component (if applicable):

      OCP 4.10.26 --> 4.11.1

      RHCOS 410.84.202208030316-0 --> 411.86.202208112011-0

      How reproducible:

      Not sure, happened in 3 known cases so far.

      Steps to Reproduce:
      1. Start with OCP 4.10.26
      2. Apply custom SELinux policy on worker nodes
      3. Upgrade OCP to 4.11.1

      Actual results:

      kubelet on the nodes after MCD update is not starting.

      journal log:

       

      -- Unit kubelet.service has begun starting up.
      Aug 25 08:37:21 cnv-x4lcc-worker-0-w57k9 systemd[83741]: kubelet.service: Failed to execute command: Permission denied
      Aug 25 08:37:21 cnv-x4lcc-worker-0-w57k9 systemd[83741]: kubelet.service: Failed at step EXEC spawning /usr/bin/hyperkube: Permission denied
      -- Subject: Process /usr/bin/hyperkube could not be executed
      -- Defined-By: systemd
      -- Support: https://access.redhat.com/support
      -- 
      -- The process /usr/bin/hyperkube could not be executed and failed.
      -- 
      -- The error number returned by this process is 13.
      Aug 25 08:37:21 cnv-x4lcc-worker-0-w57k9 systemd[1]: kubelet.service: Main process exited, code=exited, status=203/EXEC
      Aug 25 08:37:21 cnv-x4lcc-worker-0-w57k9 systemd[1]: kubelet.service: Failed with result 'exit-code'.
      -- Subject: Unit failed
      -- Defined-By: systemd
      -- Support: https://access.redhat.com/support
      -- 
      -- The unit kubelet.service has entered the 'failed' state with result 'exit-code'.
      Aug 25 08:37:21 cnv-x4lcc-worker-0-w57k9 systemd[1]: Failed to start Kubernetes Kubelet.
      -- Subject: Unit kubelet.service has failed
      -- Defined-By: systemd
      -- Support: https://access.redhat.com/support
      -- 
      -- Unit kubelet.service has failed.
      -- 
      -- The result is failed.
      Aug 25 08:37:21 cnv-x4lcc-worker-0-w57k9 systemd[1]: kubelet.service: Consumed 7ms CPU time
      -- Subject: Resources consumed by unit runtime
      -- Defined-By: systemd
      -- Support: https://access.redhat.com/support
      -- 
      -- The unit kubelet.service completed and consumed the indicated resources.
      Aug 25 08:37:21 cnv-x4lcc-worker-0-w57k9 sudo[83722]: pam_unix(sudo:session): session closed for user root 

      kubelet and hyperkube binaries uses unlabeled_t context label:

       

       

      $ ll --context /usr/bin/hyperkube && ll --context /usr/bin/kubelet
      -rwxr-xr-x. 2 root root system_u:object_r:unlabeled_t:s0 945 Jan  1  1970 /usr/bin/hyperkube
      -rwxr-xr-x. 2 root root system_u:object_r:unlabeled_t:s0 116665968 Jan  1  1970 /usr/bin/kubelet 

       

       

      Expected results:

      The policies should be integrated successfully and kubelet should start automatically without intervention.

      Additional info:

      In this state, if changing the SELinux mode from Enforcing to Permissive (setenforce 0), the kubelet starts successfully and the node becoming Ready and Schedulable.

      After that, the context of the binary changed is to 

      system_u:object_r:kubelet_exec_t:s0

       and SELinux can be put back in Enforcing mode, allowing kubelet to continue run / restart.

      Note:

      It happens when OpenShift Virtualization (CNV) 4.10 is installed on the cluster prior the upgrade to OCP 4.11.
      CNV is adding a custom SELinux policy:
      https://github.com/kubevirt/kubevirt/blob/099be903803556bfe3a85075a7b55f0a711d9ca7/pkg/virt-handler/selinux/labels.go

            walters@redhat.com Colin Walters
            ocohen@redhat.com Oren Cohen
            Michael Nguyen Michael Nguyen
            Jesse Dohmann Jesse Dohmann
            Votes:
            0 Vote for this issue
            Watchers:
            20 Start watching this issue

              Created:
              Updated:
              Resolved: