[OCPBUGS-19537] OCB pools with yum based RHEL nodes are degraded - Red Hat Issue Tracker

Type: Bug
Resolution: Done-Errata
Priority: Normal
Fix Version/s: None
Affects Version/s: 4.14.0
Component/s: Machine Config Operator
Labels:

Severity:
Moderate
Regression:
No
Epic Link:
On Cluster Layering Tech Preview
Sprint:
MCO Sprint 251, MCO Sprint 252, MCO Sprint 255, MCO Sprint 256, MCO Sprint 257
sprint_count:
5
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Release Note Text:

Hide
* Previously, if you attempted to configure on-cluster {op-system-first} image layering on a node non-{op-system} node, the node would become degraded. With this fix, in this situation you would receive an error message in the node logs, but the node would not be degraded. (link:https://issues.redhat.com/browse/OCPBUGS-19537[*OCPBUGS-197537])

Show
* Previously, if you attempted to configure on-cluster {op-system-first} image layering on a node non-{op-system} node, the node would become degraded. With this fix, in this situation you would receive an error message in the node logs, but the node would not be degraded. (link: https://issues.redhat.com/browse/OCPBUGS-19537 [*OCPBUGS-197537])
Target Version:

4.17.0
Target Backport Versions:

4.14.z, 4.15.z, 4.16.z

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Description of problem:

When we activate the on-cluster-build functionality in a pool with yum based RHEL nodes, the pool is degraded reporting this error:

  - lastTransitionTime: "2023-09-20T15:14:44Z"
    message: 'Node ip-10-0-57-169.us-east-2.compute.internal is reporting: "error
      running rpm-ostree --version: exec: \"rpm-ostree\": executable file not found
      in $PATH"'
    reason: 1 nodes are reporting degraded status on sync
    status: "True"
    type: NodeDegraded

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-09-15-233408

How reproducible:

Always

Steps to Reproduce:

1. Create a cluster and add a yum based RHEL node to the worker pool

(we used RHEL8)

2. Create the necessary resources to enable the OCB functionality. Pull and push secrets and the on-cluster-build-config configmap.

For example we can use this if we want to use the internal registry:

cat << EOF | oc create -f -
apiVersion: v1
data:
  baseImagePullSecretName: $(oc get secret -n openshift-config pull-secret -o json | jq "del(.metadata.namespace, .metadata.creationTimestamp, .metadata.resourceVersion, .metadata.uid, .metadata.name)" | jq '.metadata.name="pull-copy"' | oc -n openshift-machine-config-operator create -f - &> /dev/null; echo -n "pull-copy")
  finalImagePushSecretName: $(oc get -n openshift-machine-config-operator sa builder -ojsonpath='{.secrets[0].name}')
  finalImagePullspec: "image-registry.openshift-image-registry.svc:5000/openshift-machine-config-operator/ocb-image"
  imageBuilderType: ""
kind: ConfigMap
metadata:
  name: on-cluster-build-config
  namespace: openshift-machine-config-operator
EOF

The configuration doesn't matter as long as the OCB functionality can work.

3. Label the worker pool so that the OCB functionality is enabled

$ oc label mcp/worker machineconfiguration.openshift.io/layering-enabled=

Actual results:

The RHEL node shows this log:


I0920 15:14:42.852742    1979 daemon.go:760] Preflight config drift check successful (took 17.527225ms)
I0920 15:14:42.852763    1979 daemon.go:2150] Performing layered OS update
I0920 15:14:42.868723    1979 update.go:1970] Starting transition to "image-registry.openshift-image-registry.svc:5000/openshift-machine-config-operator/tc-67566@sha256:24ea4b12acf93095732ba457fc3e8c7f1287b669f2aceec65a33a41f7e8ceb01"
I0920 15:14:42.871625    1979 update.go:1970] drain is already completed on this node
I0920 15:14:42.874305    1979 rpm-ostree.go:307] Running captured: rpm-ostree --version
E0920 15:14:42.874388    1979 writer.go:226] Marking Degraded due to: error running rpm-ostree --version: exec: "rpm-ostree": executable file not found in $PATH
I0920 15:15:37.570503    1979 daemon.go:670] Transitioned from state: Working -> Degraded
I0920 15:15:37.570529    1979 daemon.go:673] Transitioned from degraded/unreconcilable reason  -> error running rpm-ostree --version: exec: "rpm-ostree": executable file not found in $PATH
I0920 15:15:37.574942    1979 daemon.go:2300] Not booted into a CoreOS variant, ignoring target OSImageURL quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:e3128a8e42fb70ab6fc276f7005e3c0839795e4455823c8ff3eca9b1050798b9
I0920 15:15:37.591529    1979 daemon.go:760] Preflight config drift check successful (took 16.588912ms)
I0920 15:15:37.591549    1979 daemon.go:2150] Performing layered OS update
I0920 15:15:37.591562    1979 update.go:1970] Starting transition to "image-registry.openshift-image-registry.svc:5000/openshift-machine-config-operator/tc-67566@sha256:24ea4b12acf93095732ba457fc3e8c7f1287b669f2aceec65a33a41f7e8ceb01"
I0920 15:15:37.594534    1979 update.go:1970] drain is already completed on this node
I0920 15:15:37.597261    1979 rpm-ostree.go:307] Running captured: rpm-ostree --version
E0920 15:15:37.597315    1979 writer.go:226] Marking Degraded due to: error running rpm-ostree --version: exec: "rpm-ostree": executable file not found in $PATH
qI0920 15:16:37.613270    1979 daemon.go:2300] Not booted into a CoreOS variant, ignoring target OSImageURL quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:e3128a8e42fb70ab6fc276f7005e3c0839795e4455823c8ff3eca9b1050798b9



And the worker pool is degraded with this error:

  - lastTransitionTime: "2023-09-20T15:14:44Z"
    message: 'Node ip-10-0-57-169.us-east-2.compute.internal is reporting: "error
      running rpm-ostree --version: exec: \"rpm-ostree\": executable file not found
      in $PATH"'
    reason: 1 nodes are reporting degraded status on sync
    status: "True"
    type: NodeDegraded

Expected results:

The pool should not be degraded.

Additional info:

links to

openshift/machine-config-operator#4442: OCPBUGS-19537: OCB should fail if node is not coreos based

RHEA-2024:3718 OpenShift Container Platform 4.17.z bug fix update

Assignee:: Urvashi Mohnani

Reporter:: Sergio Regidor de la Rosa

QA Contact:: Prachiti Talgulkar

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Created:: 2023/09/21 9:35 AM

Updated:: 2024/10/01 5:39 PM

Resolved:: 2024/10/01 5:39 PM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates

Hide