Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-45071

SNO upgrade can fail on [sig-cluster-lifecycle] pathological event should not see excessive Back-off restarting failed containers

XMLWordPrintable

    • Moderate
    • No
    • 1
    • OCPEDGE Sprint 263, OCPEDGE Sprint 264
    • 2
    • Proposed
    • False
    • Hide

      None

      Show
      None

      Description of problem:

      We're seeing another issue related to the DNS restart that happens during the SNO reboot.
      
      [sig-cluster-lifecycle] pathological event should not see excessive Back-off restarting failed containers

      This payload failed because these two jobs hit this excessive restart error. 1 2

      Version-Release number of selected component (if applicable):

          

      How reproducible:

          

      Steps to Reproduce:

          1.
          2.
          3.
          

      Actual results:

      event [namespace/openshift-catalogd node/ip-10-0-105-161.us-west-1.compute.internal pod/catalogd-controller-manager-59d9788859-n5sf7 hmsg/e0d473c239 - Back-off restarting failed container manager in pod catalogd-controller-manager-59d9788859-n5sf7_openshift-catalogd(78fff808-fe29-48eb-a031-0ba3650a3d84)] happened 89 times
      event [namespace/openshift-operator-controller node/ip-10-0-105-161.us-west-1.compute.internal pod/operator-controller-controller-manager-7b46748475-v9d7m hmsg/96a44e679e - Back-off restarting failed container manager in pod operator-controller-controller-manager-7b46748475-v9d7m_openshift-operator-controller(242e6d7e-cedc-42c2-9606-94848608f244)] happened 88 times}

      Expected results:

      Test passes    

      Additional info:

      Unpacking the logs in loki shows a spike of errors during the the DNS outage caused by rolling out the new DNS pod during the upgrade. This is the case for both the catalogd-controller-manager-59d9788859-n5sf7_openshift-catalogd pod in ns/openshift-catalogd and operator-controller-controller-manager-7b46748475-v9d7m_openshift-operator-controller in ns/openshift-operator-controller

              rh-ee-jeroche Jeff Roche
              jpoulin Jeremy Poulin
              Neil Hamza Neil Hamza
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated: