Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-7719

Update to 4.13.0-ec.3 stuck on leaked MachineConfig

XMLWordPrintable

    • Moderate
    • No
    • False
    • Hide

      None

      Show
      None
    • Hide
      Previously, a regression in behavior caused Machine Config Operator (MCO) to create a duplicate `MachineConfig` object in the `kubeletconfig` or `containerruntimeconfig` custom resource (CR). The duplicate object degraded and the cluster failed to upgrade. With this update, the `kubeletconfig` and `containerruntimeconfig` controllers can detect any duplicate objects and then delete them. This action removes the degraded `MachineConfig` object error and does not impact a cluster upgrade operation. (link:https://issues.redhat.com/browse/OCPBUGS-7719[*OCPBUGS-7719*])
      Show
      Previously, a regression in behavior caused Machine Config Operator (MCO) to create a duplicate `MachineConfig` object in the `kubeletconfig` or `containerruntimeconfig` custom resource (CR). The duplicate object degraded and the cluster failed to upgrade. With this update, the `kubeletconfig` and `containerruntimeconfig` controllers can detect any duplicate objects and then delete them. This action removes the degraded `MachineConfig` object error and does not impact a cluster upgrade operation. (link: https://issues.redhat.com/browse/OCPBUGS-7719 [* OCPBUGS-7719 *])
    • Bug Fix
    • Done

      Description of problem:

      An update from 4.13.0-ec.2 to 4.13.0-ec.3 stuck on:

      $ oc get clusteroperator machine-config
      NAME             VERSION       AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
      machine-config   4.13.0-ec.2   True        True          True       30h     Unable to apply 4.13.0-ec.3: error during syncRequiredMachineConfigPools: [timed out waiting for the condition, error pool worker is not ready, retrying. Status: (pool degraded: true total: 105, ready 105, updated: 105, unavailable: 0)]
      

      The worker MachineConfigPool status included:

      Unable to find source-code formatter for language: node. Available languages are: actionscript, ada, applescript, bash, c, c#, c++, cpp, css, erlang, go, groovy, haskell, html, java, javascript, js, json, lua, none, nyan, objc, perl, php, python, r, rainbow, ruby, scala, sh, sql, swift, visualbasic, xml, yaml
            type: NodeDegraded
          - lastTransitionTime: "2023-02-16T14:29:21Z"
            message: 'Failed to render configuration for pool worker: Ignoring MC 99-worker-generated-containerruntime
              generated by older version 8276d9c1f574481043d3661a1ace1f36cd8c3b62 (my version:
              c06601510c0917a48912cc2dda095d8414cc5182)'
      

      Version-Release number of selected component (if applicable):

      4.13.0-ec.3. The behavior was apparently introduced as part of OCPBUGS-6018, which has been backported, so the following update targets are expected to be vulnerable: 4.10.52+, 4.11.26+, 4.12.2+, and 4.13.0-ec.3.

      How reproducible:

      100%, when updating into a vulnerable release, if you happen to have leaked MachineConfig.

      Steps to Reproduce:

      1. 4.12.0-ec.1 dropped cleanUpDuplicatedMC. Run a later release, like 4.13.0-ec.2.
      2. Create more than one KubeletConfig or ContainerRuntimeConfig targeting the worker pool (or any pool other than master). The number of clusters who have had redundant configuration objects like this is expected to be small.
      3. (Optionally?) delete the extra KubeletConfig and ContainerRuntimeConfig.
      4. Update to 4.13.0-ec.3.

      Actual results:

      Update sticks on the machine-config ClusterOperator, as described above.

      Expected results:

      Update completes without issues.

            qiwan233 Qi Wang
            trking W. Trevor King
            Rio Liu Rio Liu
            Darragh Fitzmaurice Darragh Fitzmaurice
            Votes:
            0 Vote for this issue
            Watchers:
            14 Start watching this issue

              Created:
              Updated:
              Resolved: