[OCPBUGS-727] [4.11] Kubelet cannot be started on worker nodes after upgrade to OCP 4.11 (RHCOS 8.6) when custom SELinux policies are applied - Red Hat Issue Tracker

Type: Bug
Resolution: Done
Priority: Critical
Fix Version/s: None
Affects Version/s: 4.11
Component/s: RHCOS
Labels:
- coreos
- updateremotingteam

Severity:
Moderate
Regression:
None
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Target Version:

4.11.z

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

This is a clone of issue ~~OCPBUGS-595~~. The following is the description of the original issue:
—
Description of problem:

On an OCP cluster v4.10.26, if a custom SELinux policy is applied on nodes (either by an operator or manually by the admin, when the cluster is being upgraded to 4.11.1 and the nodes are performing MCP update pivoting to RHCOS 8.6, hyperkube and kubelet binaries could not be started due to permission error, that is caused by a wrong SELinux context label on these files.

Version-Release number of selected component (if applicable):

OCP 4.10.26 --> 4.11.1

RHCOS 410.84.202208030316-0 --> 411.86.202208112011-0

How reproducible:

Not sure, happened in 3 known cases so far.

Steps to Reproduce:
1. Start with OCP 4.10.26
2. Apply custom SELinux policy on worker nodes
3. Upgrade OCP to 4.11.1

Actual results:

kubelet on the nodes after MCD update is not starting.

journal log:

-- Unit kubelet.service has begun starting up.
Aug 25 08:37:21 cnv-x4lcc-worker-0-w57k9 systemd[83741]: kubelet.service: Failed to execute command: Permission denied
Aug 25 08:37:21 cnv-x4lcc-worker-0-w57k9 systemd[83741]: kubelet.service: Failed at step EXEC spawning /usr/bin/hyperkube: Permission denied
-- Subject: Process /usr/bin/hyperkube could not be executed
-- Defined-By: systemd
-- Support: https://access.redhat.com/support
-- 
-- The process /usr/bin/hyperkube could not be executed and failed.
-- 
-- The error number returned by this process is 13.
Aug 25 08:37:21 cnv-x4lcc-worker-0-w57k9 systemd[1]: kubelet.service: Main process exited, code=exited, status=203/EXEC
Aug 25 08:37:21 cnv-x4lcc-worker-0-w57k9 systemd[1]: kubelet.service: Failed with result 'exit-code'.
-- Subject: Unit failed
-- Defined-By: systemd
-- Support: https://access.redhat.com/support
-- 
-- The unit kubelet.service has entered the 'failed' state with result 'exit-code'.
Aug 25 08:37:21 cnv-x4lcc-worker-0-w57k9 systemd[1]: Failed to start Kubernetes Kubelet.
-- Subject: Unit kubelet.service has failed
-- Defined-By: systemd
-- Support: https://access.redhat.com/support
-- 
-- Unit kubelet.service has failed.
-- 
-- The result is failed.
Aug 25 08:37:21 cnv-x4lcc-worker-0-w57k9 systemd[1]: kubelet.service: Consumed 7ms CPU time
-- Subject: Resources consumed by unit runtime
-- Defined-By: systemd
-- Support: https://access.redhat.com/support
-- 
-- The unit kubelet.service completed and consumed the indicated resources.
Aug 25 08:37:21 cnv-x4lcc-worker-0-w57k9 sudo[83722]: pam_unix(sudo:session): session closed for user root

kubelet and hyperkube binaries uses unlabeled_t context label:

$ ll --context /usr/bin/hyperkube && ll --context /usr/bin/kubelet
-rwxr-xr-x. 2 root root system_u:object_r:unlabeled_t:s0 945 Jan  1  1970 /usr/bin/hyperkube
-rwxr-xr-x. 2 root root system_u:object_r:unlabeled_t:s0 116665968 Jan  1  1970 /usr/bin/kubelet

Expected results:

The policies should be integrated successfully and kubelet should start automatically without intervention.

Additional info:

In this state, if changing the SELinux mode from Enforcing to Permissive (setenforce 0), the kubelet starts successfully and the node becoming Ready and Schedulable.

After that, the context of the binary changed is to

system_u:object_r:kubelet_exec_t:s0

and SELinux can be put back in Enforcing mode, allowing kubelet to continue run / restart.

Note:

It happens when OpenShift Virtualization (CNV) 4.10 is installed on the cluster prior the upgrade to OCP 4.11.
CNV is adding a custom SELinux policy:
https://github.com/kubevirt/kubevirt/blob/099be903803556bfe3a85075a7b55f0a711d9ca7/pkg/virt-handler/selinux/labels.go

blocks

OCPBUGS-856 [4.10] Kubelet cannot be started on worker nodes after upgrade to OCP 4.11 (RHCOS 8.6) when custom SELinux policies are applied

Closed

clones

OCPBUGS-595 Kubelet cannot be started on worker nodes after upgrade to OCP 4.11 (RHCOS 8.6) when custom SELinux policies are applied

Closed

is blocked by

OCPBUGS-595 Kubelet cannot be started on worker nodes after upgrade to OCP 4.11 (RHCOS 8.6) when custom SELinux policies are applied

Closed

is cloned by

OCPBUGS-856 [4.10] Kubelet cannot be started on worker nodes after upgrade to OCP 4.11 (RHCOS 8.6) when custom SELinux policies are applied

Closed

OCPBUGS-857 [4..10] Kubelet cannot be started on worker nodes after upgrade to OCP 4.11 (RHCOS 8.6) when custom SELinux policies are applied

Closed

links to

openshift/os#966: [release-4.11] OCPBUGS-727: overlay: Add `rhcos-selinux-policy-upgrade.service`

(1 links to)

Assignee:: Colin Walters

Reporter:: OpenShift Prow Bot

QA Contact:: Michael Nguyen

Votes:: 0 Vote for this issue

Watchers:: 12 Start watching this issue

Created:: 2022/08/30 9:05 PM

Updated:: 2022/09/12 9:28 AM

Resolved:: 2022/09/07 8:49 PM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates

Hide