Loading...

XML

Word

Printable

Type: Bug
Resolution: Done
Priority: Critical
Fix Version/s: None
Affects Version/s: 4.11
Component/s: RHCOS
Labels:

Activity Type:
Quality / Stability / Reliability
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Story Points:
None
Severity:
Moderate
Regression:
None

Target Backport Versions:
None
Target Version:

4.12.0
Release Blocker:
None
Sprint:
Sprint 224 - Team Update&Remot
sprint_count:
1

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

PX Priority Data:
PX Impact Score:

Release Note Status:
Done
Release Note Type:
Bug Fix
Release Note Text:

Hide
* Previously, custom SELinux policy modules were not properly supported by `rpm-ostree`, so they were not updated along with the rest of the system upon update. This would surface as failures in unrelated components. Pending SELinux userspace improvements landing in a future {product-title} release, this update provides a workaround to {op-system} that will rebuild and reload the SELinux policy during boot as needed. (link:https://issues.redhat.com/browse/OCPBUGS-595[*~~OCPBUGS-595~~*])

Show
* Previously, custom SELinux policy modules were not properly supported by `rpm-ostree`, so they were not updated along with the rest of the system upon update. This would surface as failures in unrelated components. Pending SELinux userspace improvements landing in a future {product-title} release, this update provides a workaround to {op-system} that will rebuild and reload the SELinux policy during boot as needed. (link: https://issues.redhat.com/browse/OCPBUGS-595 [* OCPBUGS-595 *])

Escape Reason:
None
Escape Impact:
None
Corrective Measures:
None
SDLC stage when should've been found:
None

Description of problem:

On an OCP cluster v4.10.26, if a custom SELinux policy is applied on nodes (either by an operator or manually by the admin, when the cluster is being upgraded to 4.11.1 and the nodes are performing MCP update pivoting to RHCOS 8.6, hyperkube and kubelet binaries could not be started due to permission error, that is caused by a wrong SELinux context label on these files.

Version-Release number of selected component (if applicable):

OCP 4.10.26 --> 4.11.1

RHCOS 410.84.202208030316-0 --> 411.86.202208112011-0

How reproducible:

Not sure, happened in 3 known cases so far.

Steps to Reproduce:
1. Start with OCP 4.10.26
2. Apply custom SELinux policy on worker nodes
3. Upgrade OCP to 4.11.1

Actual results:

kubelet on the nodes after MCD update is not starting.

journal log:

-- Unit kubelet.service has begun starting up.
Aug 25 08:37:21 cnv-x4lcc-worker-0-w57k9 systemd[83741]: kubelet.service: Failed to execute command: Permission denied
Aug 25 08:37:21 cnv-x4lcc-worker-0-w57k9 systemd[83741]: kubelet.service: Failed at step EXEC spawning /usr/bin/hyperkube: Permission denied
-- Subject: Process /usr/bin/hyperkube could not be executed
-- Defined-By: systemd
-- Support: https://access.redhat.com/support
-- 
-- The process /usr/bin/hyperkube could not be executed and failed.
-- 
-- The error number returned by this process is 13.
Aug 25 08:37:21 cnv-x4lcc-worker-0-w57k9 systemd[1]: kubelet.service: Main process exited, code=exited, status=203/EXEC
Aug 25 08:37:21 cnv-x4lcc-worker-0-w57k9 systemd[1]: kubelet.service: Failed with result 'exit-code'.
-- Subject: Unit failed
-- Defined-By: systemd
-- Support: https://access.redhat.com/support
-- 
-- The unit kubelet.service has entered the 'failed' state with result 'exit-code'.
Aug 25 08:37:21 cnv-x4lcc-worker-0-w57k9 systemd[1]: Failed to start Kubernetes Kubelet.
-- Subject: Unit kubelet.service has failed
-- Defined-By: systemd
-- Support: https://access.redhat.com/support
-- 
-- Unit kubelet.service has failed.
-- 
-- The result is failed.
Aug 25 08:37:21 cnv-x4lcc-worker-0-w57k9 systemd[1]: kubelet.service: Consumed 7ms CPU time
-- Subject: Resources consumed by unit runtime
-- Defined-By: systemd
-- Support: https://access.redhat.com/support
-- 
-- The unit kubelet.service completed and consumed the indicated resources.
Aug 25 08:37:21 cnv-x4lcc-worker-0-w57k9 sudo[83722]: pam_unix(sudo:session): session closed for user root

kubelet and hyperkube binaries uses unlabeled_t context label:

$ ll --context /usr/bin/hyperkube && ll --context /usr/bin/kubelet
-rwxr-xr-x. 2 root root system_u:object_r:unlabeled_t:s0 945 Jan  1  1970 /usr/bin/hyperkube
-rwxr-xr-x. 2 root root system_u:object_r:unlabeled_t:s0 116665968 Jan  1  1970 /usr/bin/kubelet

Expected results:

The policies should be integrated successfully and kubelet should start automatically without intervention.

Additional info:

In this state, if changing the SELinux mode from Enforcing to Permissive (setenforce 0), the kubelet starts successfully and the node becoming Ready and Schedulable.

After that, the context of the binary changed is to

system_u:object_r:kubelet_exec_t:s0

and SELinux can be put back in Enforcing mode, allowing kubelet to continue run / restart.

Note:

It happens when OpenShift Virtualization (CNV) 4.10 is installed on the cluster prior the upgrade to OCP 4.11.
CNV is adding a custom SELinux policy:
https://github.com/kubevirt/kubevirt/blob/099be903803556bfe3a85075a7b55f0a711d9ca7/pkg/virt-handler/selinux/labels.go

blocks

OCPBUGS-727 [4.11] Kubelet cannot be started on worker nodes after upgrade to OCP 4.11 (RHCOS 8.6) when custom SELinux policies are applied

Closed

is cloned by

OCPBUGS-727 [4.11] Kubelet cannot be started on worker nodes after upgrade to OCP 4.11 (RHCOS 8.6) when custom SELinux policies are applied

Closed

links to

openshift/os#962: OCPBUGS-595: overlay: Add `rhcos-selinux-policy-upgrade.service`

Assignee:: Colin Walters

Reporter:: Oren Cohen

Need Info From:: None

Contributors:: None

QA Contact:: Michael Nguyen

Doc Contact:: Jesse Dohmann

Votes:: 0 Vote for this issue

Watchers:: 20 Start watching this issue

Created:: 2022/08/25 4:33 PM

Updated:: 2025/12/26 2:36 PM

Resolved:: 2023/01/17 7:39 PM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates