-
Bug
-
Resolution: Done
-
Critical
-
None
-
4.11
Description of problem:
On an OCP cluster v4.10.26, if a custom SELinux policy is applied on nodes (either by an operator or manually by the admin, when the cluster is being upgraded to 4.11.1 and the nodes are performing MCP update pivoting to RHCOS 8.6, hyperkube and kubelet binaries could not be started due to permission error, that is caused by a wrong SELinux context label on these files.
Version-Release number of selected component (if applicable):
OCP 4.10.26 --> 4.11.1
RHCOS 410.84.202208030316-0 --> 411.86.202208112011-0
How reproducible:
Not sure, happened in 3 known cases so far.
Steps to Reproduce:
1. Start with OCP 4.10.26
2. Apply custom SELinux policy on worker nodes
3. Upgrade OCP to 4.11.1
Actual results:
kubelet on the nodes after MCD update is not starting.
journal log:
-- Unit kubelet.service has begun starting up. Aug 25 08:37:21 cnv-x4lcc-worker-0-w57k9 systemd[83741]: kubelet.service: Failed to execute command: Permission denied Aug 25 08:37:21 cnv-x4lcc-worker-0-w57k9 systemd[83741]: kubelet.service: Failed at step EXEC spawning /usr/bin/hyperkube: Permission denied -- Subject: Process /usr/bin/hyperkube could not be executed -- Defined-By: systemd -- Support: https://access.redhat.com/support -- -- The process /usr/bin/hyperkube could not be executed and failed. -- -- The error number returned by this process is 13. Aug 25 08:37:21 cnv-x4lcc-worker-0-w57k9 systemd[1]: kubelet.service: Main process exited, code=exited, status=203/EXEC Aug 25 08:37:21 cnv-x4lcc-worker-0-w57k9 systemd[1]: kubelet.service: Failed with result 'exit-code'. -- Subject: Unit failed -- Defined-By: systemd -- Support: https://access.redhat.com/support -- -- The unit kubelet.service has entered the 'failed' state with result 'exit-code'. Aug 25 08:37:21 cnv-x4lcc-worker-0-w57k9 systemd[1]: Failed to start Kubernetes Kubelet. -- Subject: Unit kubelet.service has failed -- Defined-By: systemd -- Support: https://access.redhat.com/support -- -- Unit kubelet.service has failed. -- -- The result is failed. Aug 25 08:37:21 cnv-x4lcc-worker-0-w57k9 systemd[1]: kubelet.service: Consumed 7ms CPU time -- Subject: Resources consumed by unit runtime -- Defined-By: systemd -- Support: https://access.redhat.com/support -- -- The unit kubelet.service completed and consumed the indicated resources. Aug 25 08:37:21 cnv-x4lcc-worker-0-w57k9 sudo[83722]: pam_unix(sudo:session): session closed for user root
kubelet and hyperkube binaries uses unlabeled_t context label:
$ ll --context /usr/bin/hyperkube && ll --context /usr/bin/kubelet -rwxr-xr-x. 2 root root system_u:object_r:unlabeled_t:s0 945 Jan 1 1970 /usr/bin/hyperkube -rwxr-xr-x. 2 root root system_u:object_r:unlabeled_t:s0 116665968 Jan 1 1970 /usr/bin/kubelet
Expected results:
The policies should be integrated successfully and kubelet should start automatically without intervention.
Additional info:
In this state, if changing the SELinux mode from Enforcing to Permissive (setenforce 0), the kubelet starts successfully and the node becoming Ready and Schedulable.
After that, the context of the binary changed is to
system_u:object_r:kubelet_exec_t:s0
and SELinux can be put back in Enforcing mode, allowing kubelet to continue run / restart.
Note:
It happens when OpenShift Virtualization (CNV) 4.10 is installed on the cluster prior the upgrade to OCP 4.11.
CNV is adding a custom SELinux policy:
https://github.com/kubevirt/kubevirt/blob/099be903803556bfe3a85075a7b55f0a711d9ca7/pkg/virt-handler/selinux/labels.go
- blocks
-
OCPBUGS-727 [4.11] Kubelet cannot be started on worker nodes after upgrade to OCP 4.11 (RHCOS 8.6) when custom SELinux policies are applied
- Closed
- is cloned by
-
OCPBUGS-727 [4.11] Kubelet cannot be started on worker nodes after upgrade to OCP 4.11 (RHCOS 8.6) when custom SELinux policies are applied
- Closed
- links to