Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-14161

Machine config degraded afterer upgrade 4.12 -> 4.13. Failed to resync 4.13.0.

    XMLWordPrintable

Details

    • No
    • MCO Sprint 236, MCO Sprint 237
    • 2
    • False
    • Hide

      None

      Show
      None

    Description

      Description of problem:

      Upgrading scaled and loaded GCP-SDN-IPI cluster failed due MachineConfigDaemonFailed

      Version-Release number of selected component (if applicable):

      4.12.17 -> 4.13.0

      How reproducible:

      100% 2 on 2 tries.

      Steps to Reproduce:

      1. Install 4.12.17 GCP-SDN-IPI cluster 
      2. Scale up to 50 working nodes.
      3. Load cluster with 250 projects
      4. Upgrade to 4.13.0
      

      Actual results:

      $ oc get co machine-config
      NAME                                      VERSION  AVAILABLE  PROGRESSING  DEGRADED  SINCE
      machine-config                            4.13.0   False      False        True      4h55m
      
      $ omg get co machine-config -o yaml
        conditions:
        - lastTransitionTime: '2023-05-17T22:51:48Z'
          message: Cluster version is 4.13.0
          status: 'False'
          type: Progressing
        - lastTransitionTime: '2023-05-18T01:19:53Z'
          message: 'Failed to resync 4.13.0 because: failed to apply machine config daemon
            manifests: error during waitForDaemonsetRollout: [timed out waiting for the
            condition, daemonset machine-config-daemon is not ready. status: (desired: 53,
            updated: 53, ready: 51, unavailable: 2)]'
          reason: MachineConfigDaemonFailed
          status: 'True'
          type: Degraded
        - lastTransitionTime: '2023-05-18T01:19:53Z'
          message: 'Cluster not available for [{operator 4.13.0}]: failed to apply machine
            config daemon manifests: error during waitForDaemonsetRollout: [timed out waiting
            for the condition, daemonset machine-config-daemon is not ready. status: (desired:
            53, updated: 53, ready: 51, unavailable: 2)]'
          reason: MachineConfigDaemonFailed
          status: 'False'
          type: Available 
      
      $ omg logs machine-config-operator-6c54669644-tpq9s -c machine-config-operator -n openshift-machine-config-operator
      ...
      2023-05-18T07:30:54.200148168Z I0518 07:30:54.200086       1 sync.go:580] Performing safety controllerconfig sync
      2023-05-18T07:30:54.221945728Z W0518 07:30:54.221893       1 warnings.go:70] unknown field "spec.dns.metadata.creationTimestamp"
      2023-05-18T07:30:54.221945728Z W0518 07:30:54.221912       1 warnings.go:70] unknown field "spec.dns.metadata.generation"
      2023-05-18T07:30:54.221945728Z W0518 07:30:54.221916       1 warnings.go:70] unknown field "spec.dns.metadata.managedFields"
      2023-05-18T07:30:54.221945728Z W0518 07:30:54.221919       1 warnings.go:70] unknown field "spec.dns.metadata.name"
      2023-05-18T07:30:54.221945728Z W0518 07:30:54.221922       1 warnings.go:70] unknown field "spec.dns.metadata.resourceVersion"
      2023-05-18T07:30:54.221945728Z W0518 07:30:54.221927       1 warnings.go:70] unknown field "spec.dns.metadata.uid"
      2023-05-18T07:30:54.221945728Z W0518 07:30:54.221930       1 warnings.go:70] unknown field "spec.infra.metadata.creationTimestamp"
      2023-05-18T07:30:54.221945728Z W0518 07:30:54.221933       1 warnings.go:70] unknown field "spec.infra.metadata.generation"
      2023-05-18T07:30:54.221945728Z W0518 07:30:54.221937       1 warnings.go:70] unknown field "spec.infra.metadata.managedFields"
      2023-05-18T07:30:54.221945728Z W0518 07:30:54.221940       1 warnings.go:70] unknown field "spec.infra.metadata.name"
      2023-05-18T07:30:54.221997595Z W0518 07:30:54.221944       1 warnings.go:70] unknown field "spec.infra.metadata.resourceVersion"
      2023-05-18T07:30:54.221997595Z W0518 07:30:54.221947       1 warnings.go:70] unknown field "spec.infra.metadata.uid"
      2023-05-18T07:30:55.228791808Z I0518 07:30:55.228738       1 event.go:285] Event(v1.ObjectReference{Kind:"", Namespace:"", Name:"machine-config", UID:"5b1da060-01e4-46e9-a333-3416cd7c5547", APIVersion:"", ResourceVersion:"", FieldPath:""}): type: 'Warning' reason: 'OperatorDegraded: MachineConfigDaemonFailed' Failed to resync 4.13.0 because: failed to apply machine config daemon manifests: error during waitForDaemonsetRollout: [timed out waiting for the condition, daemonset machine-config-daemon is not ready. status: (desired: 53, updated: 53, ready: 51, unavailable: 2)]
      2023-05-18T07:30:55.245811310Z I0518 07:30:55.245749       1 event.go:285] Event(v1.ObjectReference{Kind:"", Namespace:"", Name:"machine-config", UID:"5b1da060-01e4-46e9-a333-3416cd7c5547", APIVersion:"", ResourceVersion:"", FieldPath:""}): type: 'Warning' reason: 'MachineConfigDaemonFailed' Cluster not available for [{operator 4.13.0}]: failed to apply machine config daemon manifests: error during waitForDaemonsetRollout: [timed out waiting for the condition, daemonset machine-config-daemon is not ready. status: (desired: 53, updated: 53, ready: 51, unavailable: 2)]
      

      Attachments

        Activity

          People

            jkyros@redhat.com John Kyros
            skordas Simon Kordas
            Sergio Regidor de la Rosa Sergio Regidor de la Rosa
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: