Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-19537

OCB pools with yum based RHEL nodes are degraded


    • Moderate
    • No
    • MCO Sprint 251, MCO Sprint 252, MCO Sprint 255, MCO Sprint 256, MCO Sprint 257
    • 5
    • False
    • Hide


    • Hide
      * Previously, if you attempted to configure on-cluster {op-system-first} image layering on a node non-{op-system} node, the node would become degraded. With this fix, in this situation you would receive an error message in the node logs, but the node would not be degraded. (link:https://issues.redhat.com/browse/OCPBUGS-19537[*OCPBUGS-197537])
      * Previously, if you attempted to configure on-cluster {op-system-first} image layering on a node non-{op-system} node, the node would become degraded. With this fix, in this situation you would receive an error message in the node logs, but the node would not be degraded. (link: https://issues.redhat.com/browse/OCPBUGS-19537 [*OCPBUGS-197537])

      Description of problem:

      When we activate the on-cluster-build functionality in a pool with yum based RHEL nodes, the pool is degraded reporting this error:
        - lastTransitionTime: "2023-09-20T15:14:44Z"
          message: 'Node ip-10-0-57-169.us-east-2.compute.internal is reporting: "error
            running rpm-ostree --version: exec: \"rpm-ostree\": executable file not found
            in $PATH"'
          reason: 1 nodes are reporting degraded status on sync
          status: "True"
          type: NodeDegraded

      Version-Release number of selected component (if applicable):


      How reproducible:


      Steps to Reproduce:

      1. Create a cluster and add a yum based RHEL node to the worker pool
      (we used RHEL8)
      2. Create the necessary resources to enable the OCB functionality. Pull and push secrets and the on-cluster-build-config configmap.
      For example we can use this if we want to use the internal registry:
      cat << EOF | oc create -f -
      apiVersion: v1
        baseImagePullSecretName: $(oc get secret -n openshift-config pull-secret -o json | jq "del(.metadata.namespace, .metadata.creationTimestamp, .metadata.resourceVersion, .metadata.uid, .metadata.name)" | jq '.metadata.name="pull-copy"' | oc -n openshift-machine-config-operator create -f - &> /dev/null; echo -n "pull-copy")
        finalImagePushSecretName: $(oc get -n openshift-machine-config-operator sa builder -ojsonpath='{.secrets[0].name}')
        finalImagePullspec: "image-registry.openshift-image-registry.svc:5000/openshift-machine-config-operator/ocb-image"
        imageBuilderType: ""
      kind: ConfigMap
        name: on-cluster-build-config
        namespace: openshift-machine-config-operator
      The configuration doesn't matter as long as the OCB functionality can work.
      3. Label the worker pool so that the OCB functionality is enabled
      $ oc label mcp/worker machineconfiguration.openshift.io/layering-enabled=

      Actual results:

      The RHEL node shows this log:
      I0920 15:14:42.852742    1979 daemon.go:760] Preflight config drift check successful (took 17.527225ms)
      I0920 15:14:42.852763    1979 daemon.go:2150] Performing layered OS update
      I0920 15:14:42.868723    1979 update.go:1970] Starting transition to "image-registry.openshift-image-registry.svc:5000/openshift-machine-config-operator/tc-67566@sha256:24ea4b12acf93095732ba457fc3e8c7f1287b669f2aceec65a33a41f7e8ceb01"
      I0920 15:14:42.871625    1979 update.go:1970] drain is already completed on this node
      I0920 15:14:42.874305    1979 rpm-ostree.go:307] Running captured: rpm-ostree --version
      E0920 15:14:42.874388    1979 writer.go:226] Marking Degraded due to: error running rpm-ostree --version: exec: "rpm-ostree": executable file not found in $PATH
      I0920 15:15:37.570503    1979 daemon.go:670] Transitioned from state: Working -> Degraded
      I0920 15:15:37.570529    1979 daemon.go:673] Transitioned from degraded/unreconcilable reason  -> error running rpm-ostree --version: exec: "rpm-ostree": executable file not found in $PATH
      I0920 15:15:37.574942    1979 daemon.go:2300] Not booted into a CoreOS variant, ignoring target OSImageURL quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:e3128a8e42fb70ab6fc276f7005e3c0839795e4455823c8ff3eca9b1050798b9
      I0920 15:15:37.591529    1979 daemon.go:760] Preflight config drift check successful (took 16.588912ms)
      I0920 15:15:37.591549    1979 daemon.go:2150] Performing layered OS update
      I0920 15:15:37.591562    1979 update.go:1970] Starting transition to "image-registry.openshift-image-registry.svc:5000/openshift-machine-config-operator/tc-67566@sha256:24ea4b12acf93095732ba457fc3e8c7f1287b669f2aceec65a33a41f7e8ceb01"
      I0920 15:15:37.594534    1979 update.go:1970] drain is already completed on this node
      I0920 15:15:37.597261    1979 rpm-ostree.go:307] Running captured: rpm-ostree --version
      E0920 15:15:37.597315    1979 writer.go:226] Marking Degraded due to: error running rpm-ostree --version: exec: "rpm-ostree": executable file not found in $PATH
      qI0920 15:16:37.613270    1979 daemon.go:2300] Not booted into a CoreOS variant, ignoring target OSImageURL quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:e3128a8e42fb70ab6fc276f7005e3c0839795e4455823c8ff3eca9b1050798b9
      And the worker pool is degraded with this error:
        - lastTransitionTime: "2023-09-20T15:14:44Z"
          message: 'Node ip-10-0-57-169.us-east-2.compute.internal is reporting: "error
            running rpm-ostree --version: exec: \"rpm-ostree\": executable file not found
            in $PATH"'
          reason: 1 nodes are reporting degraded status on sync
          status: "True"
          type: NodeDegraded

      Expected results:

      The pool should not be degraded.

      Additional info:


            umohnani Urvashi Mohnani
            sregidor@redhat.com Sergio Regidor de la Rosa
            Prachiti Talgulkar Prachiti Talgulkar
            0 Vote for this issue
            5 Start watching this issue
