Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-37624

HCP Worker nodes stuck at NotReady state for more than 3 hours

XMLWordPrintable

    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Description of problem:

      Performing HyperShift Machinepool upgrades on a 501 Worker node, without autoscaling results in worker nodes getting stuck in a NotReady state for more than 3 hours. which is good 

      Version-Release number of selected component (if applicable):

      Upgrading to 4.15.22 from 4.14.31

       

      Steps to Reproduce:

      1.  kube-burner-ocp cluster-density-v2 --iterations=4509 --churn=false --gc=false .

      2.  rosa upgrade machinepool ${mp_id} -y -c ${CLUSTER_ID} --version 4.15.22

       

      Actual results:
      New Nodes are stuck for more than 3 hours.

      oc describe on one of the nodes is like this:

        ovnkube-controller:
          Container ID:  cri-o://563e8bd4610295bec6077c06961ce0d9ae2e1faed4ddf56dfc97b64f82172f85
          Image:         quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:646a41c7ddae9d1ca35df9de2d80ce24d918176d8b562136e747e51f4ebdc5ad
          Image ID:      quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:646a41c7ddae9d1ca35df9de2d80ce24d918176d8b562136e747e51f4ebdc5ad
          Port:          29105/TCP
          Host Port:     29105/TCP
          Command:
            /bin/bash
            -c
            set -xe
            . /ovnkube-lib/ovnkube-lib.sh || exit 1
            start-ovnkube-node ${OVN_KUBE_LOG_LEVEL} 29103 29105
            
          State:      Terminated
            Reason:   Error
            Message:  ller/pkg/crd/adminpolicybasedroute/v1/apis/informers/externalversions/factory.go:140
      I0726 14:50:51.320339   41188 reflector.go:295] Stopping reflector *v1.EgressService (0s) from github.com/openshift/ovn-kubernetes/go-controller/pkg/crd/egressservice/v1/apis/informers/externalversions/factory.go:140
      I0726 14:50:51.320225   41188 reflector.go:295] Stopping reflector *v1.EgressFirewall (0s) from github.com/openshift/ovn-kubernetes/go-controller/pkg/crd/egressfirewall/v1/apis/informers/externalversions/factory.go:140
      I0726 14:50:51.320263   41188 reflector.go:295] Stopping reflector *v1.EgressQoS (0s) from github.com/openshift/ovn-kubernetes/go-controller/pkg/crd/egressqos/v1/apis/informers/externalversions/factory.go:140
      I0726 14:50:51.320376   41188 reflector.go:295] Stopping reflector *v1.Pod (0s) from k8s.io/client-go/informers/factory.go:159
      I0726 14:50:51.320269   41188 reflector.go:295] Stopping reflector *v1.EndpointSlice (0s) from k8s.io/client-go/informers/factory.go:159
      I0726 14:50:51.320296   41188 reflector.go:295] Stopping reflector *v1.NetworkPolicy (0s) from k8s.io/client-go/informers/factory.go:159
      I0726 14:50:51.320314   41188 metrics.go:506] Stopping metrics server 127.0.0.1:29103
      I0726 14:50:51.320401   41188 reflector.go:295] Stopping reflector *v1.Service (0s) from k8s.io/client-go/informers/factory.go:159
      I0726 14:50:51.320351   41188 reflector.go:295] Stopping reflector *v1.Namespace (0s) from k8s.io/client-go/informers/factory.go:159
      I0726 14:50:51.320333   41188 reflector.go:295] Stoppin
            Exit Code:  1
            Started:    Fri, 26 Jul 2024 20:15:41 +0530
            Finished:   Fri, 26 Jul 2024 20:20:51 +0530
          Last State:   Terminated
            Reason:     Error
            Message:    s/v1/apis/informers/externalversions/factory.go:140
      I0726 14:45:14.102565   39096 reflector.go:295] Stopping reflector *v1.Node (0s) from k8s.io/client-go/informers/factory.go:159
      I0726 14:45:14.102645   39096 reflector.go:295] Stopping reflector *v1.NetworkPolicy (0s) from k8s.io/client-go/informers/factory.go:159
      I0726 14:45:14.102505   39096 reflector.go:295] Stopping reflector *v1.EgressService (0s) from github.com/openshift/ovn-kubernetes/go-controller/pkg/crd/egressservice/v1/apis/informers/externalversions/factory.go:140
      I0726 14:45:14.102506   39096 reflector.go:295] Stopping reflector *v1.EgressIP (0s) from github.com/openshift/ovn-kubernetes/go-controller/pkg/crd/egressip/v1/apis/informers/externalversions/factory.go:140
      I0726 14:45:14.102533   39096 reflector.go:295] Stopping reflector *v1.AdminPolicyBasedExternalRoute (0s) from github.com/openshift/ovn-kubernetes/go-controller/pkg/crd/adminpolicybasedroute/v1/apis/informers/externalversions/factory.go:140
      I0726 14:45:14.102565   39096 metrics.go:506] Stopping metrics server 127.0.0.1:29103
      I0726 14:45:14.102477   39096 reflector.go:295] Stopping reflector *v1.Namespace (0s) from k8s.io/client-go/informers/factory.go:159
      I0726 14:45:14.102602   39096 reflector.go:295] Stopping reflector *v1.EgressFirewall (0s) from github.com/openshift/ovn-kubernetes/go-controller/pkg/crd/egressfirewall/v1/apis/informers/externalversions/factory.go:140
      I0726 14:45:14.102609   39096 reflector.go:295] Stopping reflector *v1.EndpointSlice (0s) from k8s.io/client-go/info
            Exit Code:    1

       

      =========================

      
      $ oc get no | grep NotReady
      ip-10-0-128-161.us-east-2.compute.internal   NotReady                      worker         3h17m   v1.28.11+add48d0
      ip-10-0-128-175.us-east-2.compute.internal   NotReady                      worker         3h16m   v1.28.11+add48d0
      ip-10-0-128-180.us-east-2.compute.internal   NotReady                      worker         3h16m   v1.28.11+add48d0
      ip-10-0-128-50.us-east-2.compute.internal    NotReady                      worker         3h18m   v1.28.11+add48d0
      ip-10-0-128-91.us-east-2.compute.internal    NotReady                      worker         3h15m   v1.28.11+add48d0
      ip-10-0-129-42.us-east-2.compute.internal    NotReady                      worker         3h18m   v1.28.11+add48d0
      ip-10-0-130-143.us-east-2.compute.internal   NotReady                      worker         3h15m   v1.28.11+add48d0
      ip-10-0-130-181.us-east-2.compute.internal   NotReady                      worker         3h17m   v1.28.11+add48d0
      ip-10-0-130-203.us-east-2.compute.internal   NotReady                      worker         3h17m   v1.28.11+add48d0
      ip-10-0-130-91.us-east-2.compute.internal    NotReady                      worker         3h17m   v1.28.11+add48d0
      ip-10-0-131-246.us-east-2.compute.internal   NotReady                      worker         3h17m   v1.28.11+add48d0
      ip-10-0-131-77.us-east-2.compute.internal    NotReady                      worker         3h17m   v1.28.11+add48d0
      ip-10-0-132-120.us-east-2.compute.internal   NotReady                      worker         3h17m   v1.28.11+add48d0
      ip-10-0-132-123.us-east-2.compute.internal   NotReady                      worker         3h18m   v1.28.11+add48d0
      ip-10-0-132-39.us-east-2.compute.internal    NotReady                      worker         3h18m   v1.28.11+add48d0
      ip-10-0-133-122.us-east-2.compute.internal   NotReady                      worker         3h15m   v1.28.11+add48d0
      ip-10-0-133-209.us-east-2.compute.internal   NotReady                      worker         3h17m   v1.28.11+add48d0
      ip-10-0-133-231.us-east-2.compute.internal   NotReady                      worker         3h16m   v1.28.11+add48d0
      ip-10-0-134-66.us-east-2.compute.internal    NotReady                      worker         3h16m   v1.28.11+add48d0
      ip-10-0-134-84.us-east-2.compute.internal    NotReady                      worker         3h15m   v1.28.11+add48d0
      ip-10-0-135-130.us-east-2.compute.internal   NotReady                      worker         3h17m   v1.28.11+add48d0
      ip-10-0-135-189.us-east-2.compute.internal   NotReady                      worker         3h17m   v1.28.11+add48d0
      ip-10-0-135-249.us-east-2.compute.internal   NotReady                      worker         3h18m   v1.28.11+add48d0
      ip-10-0-135-92.us-east-2.compute.internal    NotReady                      worker         3h18m   v1.28.11+add48d0
      ip-10-0-136-209.us-east-2.compute.internal   NotReady                      worker         3h17m   v1.28.11+add48d0
      ip-10-0-136-32.us-east-2.compute.internal    NotReady                      worker         3h18m   v1.28.11+add48d0
      ip-10-0-137-242.us-east-2.compute.internal   NotReady,SchedulingDisabled   worker         10h     v1.27.14+7852426
      ip-10-0-138-32.us-east-2.compute.internal    NotReady                      worker         3h18m   v1.28.11+add48d0
      ip-10-0-139-11.us-east-2.compute.internal    NotReady                      worker         3h17m   v1.28.11+add48d0
      ip-10-0-139-66.us-east-2.compute.internal    NotReady                      worker         3h18m   v1.28.11+add48d0
      ip-10-0-139-77.us-east-2.compute.internal    NotReady                      worker         3h16m   v1.28.11+add48d0
      ip-10-0-140-116.us-east-2.compute.internal   NotReady                      worker         3h18m   v1.28.11+add48d0
      ip-10-0-140-158.us-east-2.compute.internal   NotReady                      worker         3h17m   v1.28.11+add48d0
      ip-10-0-140-246.us-east-2.compute.internal   NotReady                      worker         3h18m   v1.28.11+add48d0
      ip-10-0-140-60.us-east-2.compute.internal    NotReady                      worker         3h15m   v1.28.11+add48d0
      ip-10-0-140-70.us-east-2.compute.internal    NotReady                      worker         3h18m   v1.28.11+add48d0
      ip-10-0-141-121.us-east-2.compute.internal   NotReady                      worker         3h16m   v1.28.11+add48d0
      ip-10-0-141-21.us-east-2.compute.internal    NotReady                      worker         3h17m   v1.28.11+add48d0
      ip-10-0-141-91.us-east-2.compute.internal    NotReady                      worker         3h18m   v1.28.11+add48d0
      ip-10-0-142-115.us-east-2.compute.internal   NotReady                      worker         3h18m   v1.28.11+add48d0
      ip-10-0-142-227.us-east-2.compute.internal   NotReady                      worker         3h16m   v1.28.11+add48d0
      ip-10-0-142-96.us-east-2.compute.internal    NotReady                      worker         3h18m   v1.28.11+add48d0
      ip-10-0-143-107.us-east-2.compute.internal   NotReady                      worker         3h18m   v1.28.11+add48d0
      ip-10-0-144-180.us-east-2.compute.internal   NotReady                      worker         3h16m   v1.28.11+add48d0
      ip-10-0-145-25.us-east-2.compute.internal    NotReady                      worker         3h15m   v1.28.11+add48d0
      ip-10-0-145-4.us-east-2.compute.internal     NotReady                      worker         3h15m   v1.28.11+add48d0
      ip-10-0-146-147.us-east-2.compute.internal   NotReady                      worker         3h16m   v1.28.11+add48d0
      ip-10-0-146-248.us-east-2.compute.internal   NotReady                      worker         3h16m   v1.28.11+add48d0
      ip-10-0-146-252.us-east-2.compute.internal   NotReady                      worker         3h16m   v1.28.11+add48d0
      ip-10-0-147-163.us-east-2.compute.internal   NotReady                      worker         3h15m   v1.28.11+add48d0
      ip-10-0-147-254.us-east-2.compute.internal   NotReady                      worker         3h15m   v1.28.11+add48d0
      ip-10-0-147-43.us-east-2.compute.internal    NotReady                      worker         3h15m   v1.28.11+add48d0
      ip-10-0-147-51.us-east-2.compute.internal    NotReady                      worker         3h15m   v1.28.11+add48d0
      ip-10-0-147-60.us-east-2.compute.internal    NotReady                      worker         3h16m   v1.28.11+add48d0
      ip-10-0-148-66.us-east-2.compute.internal    NotReady                      worker         3h15m   v1.28.11+add48d0
      ip-10-0-149-114.us-east-2.compute.internal   NotReady                      worker         3h16m   v1.28.11+add48d0
      ip-10-0-151-137.us-east-2.compute.internal   NotReady                      worker         3h16m   v1.28.11+add48d0
      ip-10-0-152-173.us-east-2.compute.internal   NotReady                      worker         3h16m   v1.28.11+add48d0
      ip-10-0-152-62.us-east-2.compute.internal    NotReady                      worker         3h16m   v1.28.11+add48d0
      ip-10-0-152-98.us-east-2.compute.internal    NotReady                      worker         3h16m   v1.28.11+add48d0
      ip-10-0-153-155.us-east-2.compute.internal   NotReady                      worker         3h16m   v1.28.11+add48d0
      ip-10-0-153-212.us-east-2.compute.internal   NotReady                      worker         3h15m   v1.28.11+add48d0
      ip-10-0-153-5.us-east-2.compute.internal     NotReady                      worker         3h16m   v1.28.11+add48d0
      ip-10-0-154-38.us-east-2.compute.internal    NotReady                      worker         3h15m   v1.28.11+add48d0
      ip-10-0-154-86.us-east-2.compute.internal    NotReady                      worker         3h16m   v1.28.11+add48d0
      ip-10-0-155-208.us-east-2.compute.internal   NotReady                      worker         3h15m   v1.28.11+add48d0
      ip-10-0-155-255.us-east-2.compute.internal   NotReady                      worker         3h15m   v1.28.11+add48d0
      ip-10-0-155-96.us-east-2.compute.internal    NotReady                      worker         3h16m   v1.28.11+add48d0
      ip-10-0-156-105.us-east-2.compute.internal   NotReady                      worker         3h16m   v1.28.11+add48d0
      ip-10-0-156-139.us-east-2.compute.internal   NotReady                      worker         3h16m   v1.28.11+add48d0
      ip-10-0-156-220.us-east-2.compute.internal   NotReady                      worker         3h15m   v1.28.11+add48d0
      ip-10-0-156-245.us-east-2.compute.internal   NotReady                      worker         3h16m   v1.28.11+add48d0
      ip-10-0-156-48.us-east-2.compute.internal    NotReady                      worker         3h15m   v1.28.11+add48d0
      ip-10-0-157-2.us-east-2.compute.internal     NotReady                      worker         3h16m   v1.28.11+add48d0
      ip-10-0-157-204.us-east-2.compute.internal   NotReady                      worker         3h15m   v1.28.11+add48d0
      ip-10-0-157-246.us-east-2.compute.internal   NotReady                      worker         3h17m   v1.28.11+add48d0
      ip-10-0-157-56.us-east-2.compute.internal    NotReady                      worker         3h15m   v1.28.11+add48d0
      ip-10-0-157-89.us-east-2.compute.internal    NotReady                      worker         3h15m   v1.28.11+add48d0
      ip-10-0-158-112.us-east-2.compute.internal   NotReady                      worker         3h16m   v1.28.11+add48d0
      ip-10-0-158-115.us-east-2.compute.internal   NotReady                      worker         3h16m   v1.28.11+add48d0
      ip-10-0-158-211.us-east-2.compute.internal   NotReady                      worker         3h15m   v1.28.11+add48d0
      ip-10-0-158-92.us-east-2.compute.internal    NotReady,SchedulingDisabled   worker         10h     v1.27.14+7852426
      ip-10-0-159-139.us-east-2.compute.internal   NotReady                      worker         3h14m   v1.28.11+add48d0
      ip-10-0-159-163.us-east-2.compute.internal   NotReady                      worker         3h15m   v1.28.11+add48d0
      ip-10-0-159-202.us-east-2.compute.internal   NotReady                      worker         3h15m   v1.28.11+add48d0
      ip-10-0-160-102.us-east-2.compute.internal   NotReady                      worker         21m     v1.28.11+add48d0
      ip-10-0-160-42.us-east-2.compute.internal    NotReady                      worker         3h19m   v1.28.11+add48d0
      ip-10-0-160-54.us-east-2.compute.internal    NotReady                      worker         19m     v1.28.11+add48d0
      ip-10-0-161-170.us-east-2.compute.internal   NotReady                      worker         3h18m   v1.28.11+add48d0
      ip-10-0-161-237.us-east-2.compute.internal   NotReady                      worker         22m     v1.28.11+add48d0
      ip-10-0-162-113.us-east-2.compute.internal   NotReady                      worker         3h19m   v1.28.11+add48d0
      ip-10-0-162-147.us-east-2.compute.internal   NotReady                      worker         3h18m   v1.28.11+add48d0
      ip-10-0-162-176.us-east-2.compute.internal   NotReady                      worker         3h17m   v1.28.11+add48d0
      ip-10-0-162-39.us-east-2.compute.internal    NotReady                      worker         3h17m   v1.28.11+add48d0
      ip-10-0-162-93.us-east-2.compute.internal    NotReady                      worker         3h20m   v1.28.11+add48d0
      ip-10-0-163-151.us-east-2.compute.internal   NotReady                      worker         3h18m   v1.28.11+add48d0
      ip-10-0-163-193.us-east-2.compute.internal   NotReady                      worker         3h18m   v1.28.11+add48d0
      ip-10-0-163-42.us-east-2.compute.internal    NotReady                      worker         22m     v1.28.11+add48d0
      ip-10-0-164-217.us-east-2.compute.internal   NotReady                      worker         3h19m   v1.28.11+add48d0
      ip-10-0-164-242.us-east-2.compute.internal   NotReady                      worker         3h18m   v1.28.11+add48d0
      ip-10-0-165-96.us-east-2.compute.internal    NotReady                      worker         22m     v1.28.11+add48d0
      ip-10-0-166-57.us-east-2.compute.internal    NotReady                      worker         3h19m   v1.28.11+add48d0
      ip-10-0-167-49.us-east-2.compute.internal    NotReady                      worker         3h19m   v1.28.11+add48d0
      ip-10-0-168-108.us-east-2.compute.internal   NotReady                      worker         3h18m   v1.28.11+add48d0
      ip-10-0-168-15.us-east-2.compute.internal    NotReady                      worker         3h19m   v1.28.11+add48d0
      ip-10-0-169-147.us-east-2.compute.internal   NotReady                      worker         3h17m   v1.28.11+add48d0
      ip-10-0-169-36.us-east-2.compute.internal    NotReady                      worker         3h20m   v1.28.11+add48d0
      ip-10-0-170-34.us-east-2.compute.internal    NotReady                      worker         3h18m   v1.28.11+add48d0
      ip-10-0-170-49.us-east-2.compute.internal    NotReady                      worker         3h20m   v1.28.11+add48d0
      ip-10-0-171-12.us-east-2.compute.internal    NotReady                      worker         3h19m   v1.28.11+add48d0
      ip-10-0-171-122.us-east-2.compute.internal   NotReady                      worker         3h18m   v1.28.11+add48d0
      ip-10-0-171-9.us-east-2.compute.internal     NotReady                      worker         3h18m   v1.28.11+add48d0
      ip-10-0-172-118.us-east-2.compute.internal   NotReady                      worker         3h18m   v1.28.11+add48d0
      ip-10-0-172-131.us-east-2.compute.internal   NotReady                      worker         24m     v1.28.11+add48d0
      ip-10-0-173-100.us-east-2.compute.internal   NotReady                      worker         3h19m   v1.28.11+add48d0
      ip-10-0-173-102.us-east-2.compute.internal   NotReady                      worker         3h18m   v1.28.11+add48d0
      ip-10-0-173-116.us-east-2.compute.internal   NotReady                      worker         3h19m   v1.28.11+add48d0
      ip-10-0-173-14.us-east-2.compute.internal    NotReady                      worker         3h19m   v1.28.11+add48d0
      ip-10-0-173-37.us-east-2.compute.internal    NotReady                      worker         3h18m   v1.28.11+add48d0
      ip-10-0-173-69.us-east-2.compute.internal    NotReady                      worker         3h18m   v1.28.11+add48d0
      ip-10-0-174-73.us-east-2.compute.internal    NotReady                      worker         3h20m   v1.28.11+add48d0
      ip-10-0-175-108.us-east-2.compute.internal   NotReady                      worker         3h18m   v1.28.11+add48d0
      ip-10-0-175-123.us-east-2.compute.internal   NotReady                      worker         3h19m   v1.28.11+add48d0
      ip-10-0-175-144.us-east-2.compute.internal   NotReady                      worker         2m55s   v1.28.11+add48d0
      ip-10-0-175-38.us-east-2.compute.internal    NotReady                      worker         3h19m   v1.28.11+add48d0
      ip-10-0-175-46.us-east-2.compute.internal    NotReady                      worker         3h18m   v1.28.11+add48d0 

      Expected results:

      Migration to be successful.

       

      More info:

       

      Environment: Staging
      Shard_ID: 75a07024-d4ca-11ee-b50c-0a580a810268
      

       

       

       

              pepalani@redhat.com Periyasamy Palanisamy
              krvoora-ocm Harsha Voora (Inactive)
              None
              None
              Anurag Saxena Anurag Saxena
              None
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

                Created:
                Updated:
                Resolved: