Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-7719

Update to 4.13.0-ec.3 stuck on leaked MachineConfig

    XMLWordPrintable

Details

    • Moderate
    • No
    • False
    • Hide

      None

      Show
      None
    • Hide
      Previously, a regression in behavior caused Machine Config Operator (MCO) to create a duplicate `MachineConfig` object in the `kubeletconfig` or `containerruntimeconfig` custom resource (CR). The duplicate object degraded and the cluster failed to upgrade. With this update, the `kubeletconfig` and `containerruntimeconfig` controllers can detect any duplicate objects and then delete them. This action removes the degraded `MachineConfig` object error and does not impact a cluster upgrade operation. (link:https://issues.redhat.com/browse/OCPBUGS-7719[*OCPBUGS-7719*])
      Show
      Previously, a regression in behavior caused Machine Config Operator (MCO) to create a duplicate `MachineConfig` object in the `kubeletconfig` or `containerruntimeconfig` custom resource (CR). The duplicate object degraded and the cluster failed to upgrade. With this update, the `kubeletconfig` and `containerruntimeconfig` controllers can detect any duplicate objects and then delete them. This action removes the degraded `MachineConfig` object error and does not impact a cluster upgrade operation. (link: https://issues.redhat.com/browse/OCPBUGS-7719 [* OCPBUGS-7719 *])
    • Bug Fix
    • Done

    Description

      Description of problem:

      An update from 4.13.0-ec.2 to 4.13.0-ec.3 stuck on:

      $ oc get clusteroperator machine-config
      NAME             VERSION       AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
      machine-config   4.13.0-ec.2   True        True          True       30h     Unable to apply 4.13.0-ec.3: error during syncRequiredMachineConfigPools: [timed out waiting for the condition, error pool worker is not ready, retrying. Status: (pool degraded: true total: 105, ready 105, updated: 105, unavailable: 0)]
      

      The worker MachineConfigPool status included:

      Unable to find source-code formatter for language: node. Available languages are: actionscript, ada, applescript, bash, c, c#, c++, cpp, css, erlang, go, groovy, haskell, html, java, javascript, js, json, lua, none, nyan, objc, perl, php, python, r, rainbow, ruby, scala, sh, sql, swift, visualbasic, xml, yaml
            type: NodeDegraded
          - lastTransitionTime: "2023-02-16T14:29:21Z"
            message: 'Failed to render configuration for pool worker: Ignoring MC 99-worker-generated-containerruntime
              generated by older version 8276d9c1f574481043d3661a1ace1f36cd8c3b62 (my version:
              c06601510c0917a48912cc2dda095d8414cc5182)'
      

      Version-Release number of selected component (if applicable):

      4.13.0-ec.3. The behavior was apparently introduced as part of OCPBUGS-6018, which has been backported, so the following update targets are expected to be vulnerable: 4.10.52+, 4.11.26+, 4.12.2+, and 4.13.0-ec.3.

      How reproducible:

      100%, when updating into a vulnerable release, if you happen to have leaked MachineConfig.

      Steps to Reproduce:

      1. 4.12.0-ec.1 dropped cleanUpDuplicatedMC. Run a later release, like 4.13.0-ec.2.
      2. Create more than one KubeletConfig or ContainerRuntimeConfig targeting the worker pool (or any pool other than master). The number of clusters who have had redundant configuration objects like this is expected to be small.
      3. (Optionally?) delete the extra KubeletConfig and ContainerRuntimeConfig.
      4. Update to 4.13.0-ec.3.

      Actual results:

      Update sticks on the machine-config ClusterOperator, as described above.

      Expected results:

      Update completes without issues.

      Attachments

        Issue Links

          Activity

            People

              qiwan233 Qi Wang
              trking W. Trevor King
              Rio Liu Rio Liu
              Darragh Fitzmaurice Darragh Fitzmaurice
              Votes:
              0 Vote for this issue
              Watchers:
              14 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: