Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-5843

4.12 to 4.13 upgrade job failing frequently on node not ready

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Cannot Reproduce
    • Icon: Major Major
    • None
    • 4.13
    • None
    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • None
    • None
    • None
    • None
    • None
    • Rejected
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Description of problem:

      Per the discussion on Slack, the 4.12 to 4.13 upgrade job has been failing frequently. It appears this is due to a node going NotReady at an unexpected time, and in the logs we see failures when calling api-int. This may be caused by a problem with the internal loadbalancer.

      Version-Release number of selected component (if applicable):

       

      How reproducible:

       

      Steps to Reproduce:

      1.
      2.
      3.
      

      Actual results:

      Node goes NotReady at an unexpected time

      Expected results:

      Node only goes NotReady when it reboots for upgrade

      Additional info:

      This is difficult to debug because the on-prem service logs are not persisted over reboot, and the bogus NotReady state is happening before the node reboots. The first step of fixing this will be to fix the logging so we have usable data from before reboot.

       

       

              bnemec@redhat.com Benjamin Nemec
              bnemec@redhat.com Benjamin Nemec
              None
              None
              Zhanqi Zhao Zhanqi Zhao
              None
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated:
                Resolved: