Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-42081

In OCB, "enforcing=0" kernel argument is degrading the MachineConfigPool

XMLWordPrintable

    • Moderate
    • None
    • MCO Sprint 259
    • 1
    • False
    • Hide

      None

      Show
      None
    • Hide
      * Previously, if you enabled on-cluster layering for your cluster and you attempted to configure kernel arguments in the machine configuration, machine config pools (MCPs) and nodes entered a degraded state. This happened because of a configuration mismatch. With this release, a check for kernel arguments for a cluster with OCL-enabled ensures that the arguments are configured and applied to nodes in the cluster. This update prevents any mismatch that previously occurred between the machine configuration and the node configuration. (link:https://issues.redhat.com/browse/OCPBUGS-42081[*OCPBUGS-42081*])
      Show
      * Previously, if you enabled on-cluster layering for your cluster and you attempted to configure kernel arguments in the machine configuration, machine config pools (MCPs) and nodes entered a degraded state. This happened because of a configuration mismatch. With this release, a check for kernel arguments for a cluster with OCL-enabled ensures that the arguments are configured and applied to nodes in the cluster. This update prevents any mismatch that previously occurred between the machine configuration and the node configuration. (link: https://issues.redhat.com/browse/OCPBUGS-42081 [* OCPBUGS-42081 *])
    • Bug Fix
    • Done

      This is a clone of issue OCPBUGS-34647. The following is the description of the original issue:

      Description of problem:

      When we enable OCB functionality and we create a MC that configures an eforcing=0 kernel argumnent the MCP is degraded reporting this message
      
                    {
                        "lastTransitionTime": "2024-05-30T09:37:06Z",
                        "message": "Node ip-10-0-29-166.us-east-2.compute.internal is reporting: \"unexpected on-disk state validating against quay.io/mcoqe/layering@sha256:654149c7e25a1ada80acb8eedc3ecf9966a8d29e9738b39fcbedad44ddd15ed5: missing expected kernel arguments: [enforcing=0]\"",
                        "reason": "1 nodes are reporting degraded status on sync",
                        "status": "True",
                        "type": "NodeDegraded"
                    },
      
      
          

      Version-Release number of selected component (if applicable):

      IPI on AWS
      
      $ oc get clusterversion
      NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
      version   4.16.0-0.nightly-2024-05-30-021120   True        False         97m     Error while reconciling 4.16.0-0.nightly-2024-05-30-021120: the cluster operator olm is not available
      
          

      How reproducible:

      Alwasy
          

      Steps to Reproduce:

          1. Enable techpreview
      $ oc patch featuregate cluster --type=merge -p '{"spec":{"featureSet": "TechPreviewNoUpgrade"}}'
      
          2. Configure a MSOC resource to enable OCB functionality in the worker pool
      
      When we hit this problem we were using the mcoqe quay repository.
      A copy of the pull-secret for baseImagePullSecret and renderedImagePushSecret and no currentImagePullSecret configured.
      
      apiVersion: machineconfiguration.openshift.io/v1alpha1
      kind: MachineOSConfig
      metadata:
        name: worker
      spec:
        machineConfigPool:
          name: worker
      #  buildOutputs:
      #    currentImagePullSecret:
      #      name: ""
        buildInputs:
          imageBuilder:
            imageBuilderType: PodImageBuilder
          baseImagePullSecret:
            name: pull-copy 
          renderedImagePushSecret:
            name: pull-copy 
          renderedImagePushspec: "quay.io/mcoqe/layering:latest"
      
          3. Create a MC to use enforing=0 kernel argument
      
      {
          "kind": "List",
          "apiVersion": "v1",
          "metadata": {},
          "items": [
              {
                  "apiVersion": "machineconfiguration.openshift.io/v1",
                  "kind": "MachineConfig",
                  "metadata": {
                      "labels": {
                          "machineconfiguration.openshift.io/role": "worker"
                      },
                      "name": "change-worker-kernel-selinux-gvr393x2"
                  },
                  "spec": {
                      "config": {
                          "ignition": {
                              "version": "3.2.0"
                          }
                      },
                      "kernelArguments": [
                          "enforcing=0"
                      ]
                  }
              }
          ]
      }
      
          

      Actual results:

      The worker MCP is degraded reporting this message:
      
      oc get mcp worker -oyaml
      ....
      
                    {
                        "lastTransitionTime": "2024-05-30T09:37:06Z",
                        "message": "Node ip-10-0-29-166.us-east-2.compute.internal is reporting: \"unexpected on-disk state validating against quay.io/mcoqe/layering@sha256:654149c7e25a1ada80acb8eedc3ecf9966a8d29e9738b39fcbedad44ddd15ed5: missing expected kernel arguments: [enforcing=0]\"",
                        "reason": "1 nodes are reporting degraded status on sync",
                        "status": "True",
                        "type": "NodeDegraded"
                    },
      
          

      Expected results:

      The MC should be applied without problems and selinux should be using enforcing=0
          

      Additional info:

      
          

            umohnani Urvashi Mohnani
            openshift-crt-jira-prow OpenShift Prow Bot
            Sergio Regidor de la Rosa Sergio Regidor de la Rosa
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated:
              Resolved: