Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-19892

Some ovnkube-nodes unable to become fully healthy at startup

XMLWordPrintable

    • Critical
    • No
    • Proposed
    • False
    • Hide

      None

      Show
      None

      Description of problem:

      In the must-gather of an attached cluster, a 135 node cluster was created, however 7 nodes are in a NotReady state because the ovnkube-node pods have only 7/8 containers healthy.

      Version-Release number of selected component (if applicable):

      4.14.0-rc.2

      How reproducible:

       

      Steps to Reproduce:

      1.
      2.
      3.
      

      Actual results:

      ❯ k get po -n openshift-ovn-kubernetes -owide | grep '7/8'                                                  
      ovnkube-node-bbgpd                      7/8     Running   1 (2m11s ago)   7m55s   10.0.117.145   ip-10-0-117-145.us-east-2.compute.internal   <none>           <none>
      ovnkube-node-bx7bk                      7/8     Running   1 (2m1s ago)    7m43s   10.0.38.242    ip-10-0-38-242.us-east-2.compute.internal    <none>           <none>
      ovnkube-node-gqhrt                      7/8     Running   1 (2m3s ago)    7m45s   10.0.6.220     ip-10-0-6-220.us-east-2.compute.internal     <none>           <none>
      ovnkube-node-hlnzm                      7/8     Running   1 (2m3s ago)    7m47s   10.0.37.244    ip-10-0-37-244.us-east-2.compute.internal    <none>           <none>
      ovnkube-node-q7n5q                      7/8     Running   1 (4m11s ago)   9m54s   10.0.114.61    ip-10-0-114-61.us-east-2.compute.internal    <none>           <none>
      ovnkube-node-sltd9                      7/8     Running   1 (2m5s ago)    7m48s   10.0.118.224   ip-10-0-118-224.us-east-2.compute.internal   <none>           <none>
      ovnkube-node-v8p5b                      7/8     Running   1 (2m18s ago)   8m2s    10.0.86.112    ip-10-0-86-112.us-east-2.compute.internal    <none>           <none>
      ❯ k get no | grep NotReady                                
      ip-10-0-114-61.us-east-2.compute.internal    NotReady,SchedulingDisabled   worker                 10m     v1.27.6+1648878
      ip-10-0-117-145.us-east-2.compute.internal   NotReady,SchedulingDisabled   worker                 8m8s    v1.27.6+1648878
      ip-10-0-118-224.us-east-2.compute.internal   NotReady,SchedulingDisabled   worker                 8m1s    v1.27.6+1648878
      ip-10-0-37-244.us-east-2.compute.internal    NotReady,SchedulingDisabled   worker                 8m      v1.27.6+1648878
      ip-10-0-38-242.us-east-2.compute.internal    NotReady,SchedulingDisabled   worker                 7m56s   v1.27.6+1648878
      ip-10-0-6-220.us-east-2.compute.internal     NotReady,SchedulingDisabled   worker                 7m58s   v1.27.6+1648878
      ip-10-0-86-112.us-east-2.compute.internal    NotReady,SchedulingDisabled   worker                 8m15s   v1.27.6+1648878

      Expected results:

      All nodes can join the cluster/all ovnkube-node pods can become 8/8 Running

      Additional info:

      ovnkube-controller is the container that cannot start, with logs like:
      
      I0928 12:38:05.130975   17439 default_node_network_controller.go:755] Waiting for node ip-10-0-117-145.us-east-2.compute.internal to start, no annotation found on node for subnet: could not find "k8s.ovn.org/node-subnets" annotation

              trozet@redhat.com Tim Rozet
              mshen.openshift Michael Shen
              Anurag Saxena Anurag Saxena
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Created:
                Updated:
                Resolved: