Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-45846

[sig-node] node-lifecycle detects unexpected not ready node - test firing during apiserver gracefulshutdown

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Cannot Reproduce
    • Icon: Normal Normal
    • None
    • 4.18.z
    • kube-apiserver
    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      During an investigate of the latest round of unexpected node not ready failures on metal upgrade jobs, I found two jobs that have similar issues. These are new failures to me. It seems that both are failing during the process of a gracefulshutdown of the kube-apiserver.

      https://sippy.dptools.openshift.org/sippy-ng/component_readiness/test_details?Aggregation=none&Architecture=amd64&FeatureSet=default&Installer=ipi&LayeredProduct=none&Network=ovn&NetworkAccess=default&Platform=aws&Procedure=none&Scheduler=default&SecurityMode=default&Suite=serial&Topology=ha&Upgrade=none&baseEndTime=2024-10-01%2023%3A59%3A59&baseRelease=4.17&baseStartTime=2024-09-01%2000%3A00%3A00&capability=Other&columnGroupBy=Architecture%2CNetwork%2CPlatform%2CTopology&component=Node%20%2F%20Kubelet&confidence=95&dbGroupBy=Platform%2CArchitecture%2CNetwork%2CTopology%2CFeatureSet%2CUpgrade%2CSuite%2CInstaller&environment=amd64%20default%20ipi%20ovn%20aws%20serial%20ha%20none&ignoreDisruption=true&ignoreMissing=false&includeVariant=Architecture%3Aamd64&includeVariant=CGroupMode%3Av2&includeVariant=ContainerRuntime%3Arunc&includeVariant=FeatureSet%3Adefault&includeVariant=Installer%3Aipi&includeVariant=Installer%3Aupi&includeVariant=Network%3Aovn&includeVariant=Owner%3Aeng&includeVariant=Platform%3Aaws&includeVariant=Platform%3Aazure&includeVariant=Platform%3Agcp&includeVariant=Platform%3Ametal&includeVariant=Platform%3Avsphere&includeVariant=Topology%3Aha&includeVariant=Topology%3Amicroshift&minFail=3&passRateAllTests=0&passRateNewTests=95&pity=5&sampleEndTime=2024-11-05%2023%3A59%3A59&sampleRelease=4.18&sampleStartTime=2024-10-29%2000%3A00%3A00&testId=openshift-tests%3A1f3a2a9f8d7b6e8deb502468746bc363&testName=%5Bsig-node%5D%20node-lifecycle%20detects%20unexpected%20not%20ready%20node

      Job run 1: https://prow.ci.openshift.org/view/gs/test-platform-results/logs/periodic-ci-openshift-release-master-nightly-4.18-upgrade-from-stable-4.17-e2e-metal-ipi-ovn-upgrade/1864621506026278912

      Job run 2: https://prow.ci.openshift.org/view/gs/test-platform-results/logs/periodic-ci-openshift-release-master-nightly-4.18-upgrade-from-stable-4.17-e2e-metal-ipi-ovn-upgrade/1864621506026278912

      Drilling into job run 1 (2 has similar problem):

      I see that the UnexpectedNodeNotReady happens during a graceful shutdown of the apiserver.

      There are interesting events in etcd, kubeapiserver pods being killed at this time.

      Could apiserver look deeper into these issues?

       

       

              aguclu@redhat.com Arda Guclu
              rh-ee-kehannon Kevin Hannon
              None
              None
              Ke Wang Ke Wang
              None
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

                Created:
                Updated:
                Resolved: