Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-46053

MCD pod deletions repeat pathologically when OCL is enabled

XMLWordPrintable

    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • 1
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Description of problem:

      During upgrade testing with OCL enabled, it was discovered that MCD pod deletion was reoccurring pathologically. Originally, I thought the source for this might have been https://issues.redhat.com/browse/OCPBUGS-42695 since that was causing unnecessary restarts of the MCD deployment. However, since that PR has landed, the issue remains.

      Version-Release number of selected component (if applicable):

          

      How reproducible:

      Always in the e2e-aws-ovn-upgrade-ocb job that we intend to use to test OCL upgrade flows. Here's a rehearsal run where this failure occurs: https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/openshift_release/58241/rehearse-58241-pull-ci-openshift-machine-config-operator-master-e2e-aws-ovn-upgrade-ocb/1851035793171156992

      Steps to Reproduce:

      Run the aforementioned CI job.

      Actual results:

      Test [sig-arch] events should not repeat pathologically for ns/openshift-machine-config-operator fails with the following information:

          {  8 events happened too frequentlyevent happened 32 times, something is wrong: namespace/openshift-machine-config-operator daemonset/machine-config-daemon hmsg/d8663bab95 - reason/SuccessfulDelete (combined from similar events): Deleted pod: machine-config-daemon-hqrq7 (00:44:33Z) result=reject 
      event happened 74 times, something is wrong: namespace/openshift-machine-config-operator daemonset/machine-config-daemon hmsg/d705389d90 - reason/SuccessfulDelete (combined from similar events): Deleted pod: machine-config-daemon-njl7w (00:49:22Z) result=reject 
      event happened 116 times, something is wrong: namespace/openshift-machine-config-operator daemonset/machine-config-daemon hmsg/86735cbecb - reason/SuccessfulDelete (combined from similar events): Deleted pod: machine-config-daemon-fbfmv (00:54:27Z) result=reject 
      event happened 188 times, something is wrong: namespace/openshift-machine-config-operator daemonset/machine-config-daemon hmsg/dc2da581d9 - reason/SuccessfulDelete (combined from similar events): Deleted pod: machine-config-daemon-4sczr (00:59:43Z) result=reject 
      event happened 32 times, something is wrong: namespace/openshift-machine-config-operator daemonset/machine-config-daemon hmsg/de1673a806 - reason/SuccessfulCreate (combined from similar events): Created pod: machine-config-daemon-klxjf (01:07:37Z) result=reject 
      event happened 81 times, something is wrong: namespace/openshift-machine-config-operator daemonset/machine-config-daemon hmsg/f9637dcf20 - reason/SuccessfulDelete (combined from similar events): Deleted pod: machine-config-daemon-6vbtg (01:12:37Z) result=reject 
      event happened 152 times, something is wrong: namespace/openshift-machine-config-operator daemonset/machine-config-daemon hmsg/ad46a2def0 - reason/SuccessfulDelete (combined from similar events): Deleted pod: machine-config-daemon-kbb8f (01:17:50Z) result=reject 
      event happened 188 times, something is wrong: namespace/openshift-machine-config-operator daemonset/machine-config-daemon hmsg/9ca5e98d9c - reason/SuccessfulDelete (combined from similar events): Deleted pod: machine-config-daemon-cp2xl (01:22:40Z) result=reject }

      Expected results:

      The aforementioned test should pass.

       

      Additional info:

          

              zzlotnik@redhat.com Zack Zlotnik
              zzlotnik@redhat.com Zack Zlotnik
              None
              None
              Sergio Regidor de la Rosa Sergio Regidor de la Rosa
              None
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated:
                Resolved: