Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-8260

Update to 4.13.0-ec.3 stuck on leaked MachineConfig

    XMLWordPrintable

Details

    • Bug
    • Resolution: Done
    • Undefined
    • None
    • 4.13
    • Node / Kubelet
    • Moderate
    • No
    • False
    • Hide

      None

      Show
      None

    Description

      This is a clone of issue OCPBUGS-8261. The following is the description of the original issue:

      Description of problem:

      An update from 4.13.0-ec.2 to 4.13.0-ec.3 stuck on:

      $ oc get clusteroperator machine-config
      NAME             VERSION       AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
      machine-config   4.13.0-ec.2   True        True          True       30h     Unable to apply 4.13.0-ec.3: error during syncRequiredMachineConfigPools: [timed out waiting for the condition, error pool worker is not ready, retrying. Status: (pool degraded: true total: 105, ready 105, updated: 105, unavailable: 0)]
      

      The worker MachineConfigPool status included:

      Unable to find source-code formatter for language: node. Available languages are: actionscript, ada, applescript, bash, c, c#, c++, cpp, css, erlang, go, groovy, haskell, html, java, javascript, js, json, lua, none, nyan, objc, perl, php, python, r, rainbow, ruby, scala, sh, sql, swift, visualbasic, xml, yaml      type: NodeDegraded
          - lastTransitionTime: "2023-02-16T14:29:21Z"
            message: 'Failed to render configuration for pool worker: Ignoring MC 99-worker-generated-containerruntime
              generated by older version 8276d9c1f574481043d3661a1ace1f36cd8c3b62 (my version:
              c06601510c0917a48912cc2dda095d8414cc5182)'
      

      Version-Release number of selected component (if applicable):

      4.13.0-ec.3. The behavior was apparently introduced as part of OCPBUGS-6018, which has been backported, so the following update targets are expected to be vulnerable: 4.10.52+, 4.11.26+, 4.12.2+, and 4.13.0-ec.3.

      How reproducible:

      100%, when updating into a vulnerable release, if you happen to have leaked MachineConfig.

      Steps to Reproduce:

      1. 4.12.0-ec.1 dropped cleanUpDuplicatedMC. Run a later release, like 4.13.0-ec.2.
      2. Create more than one KubeletConfig or ContainerRuntimeConfig targeting the worker pool (or any pool other than master). The number of clusters who have had redundant configuration objects like this is expected to be small.
      3. (Optionally?) delete the extra KubeletConfig and ContainerRuntimeConfig.
      4. Update to 4.13.0-ec.3.

      Actual results:

      Update sticks on the machine-config ClusterOperator, as described above.

      Expected results:

      Update completes without issues.

      Attachments

        Issue Links

          Activity

            People

              qiwan233 Qi Wang
              openshift-crt-jira-prow OpenShift Prow Bot
              Weinan Liu Weinan Liu
              Votes:
              1 Vote for this issue
              Watchers:
              16 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: