Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-55912

ARO Node Not Going NotReady when no communication

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Not a Bug
    • Icon: Normal Normal
    • None
    • 4.15
    • Node / Kubelet
    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • None
    • None
    • None
    • None
    • None
    • None
    • OCP Node Sprint 272 (Blue)
    • 1
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Description of problem:

      During a chaotic scenario on an ARO cluster where we block inbound and outbound traffic to a node, the worker node never reports notready status. Master nodes aren't able to ping/make a connection to the worker node and no status changes 
      
          

      Version-Release number of selected component (if applicable):

      4.15.35
          

      How reproducible:

      100%
          

      Steps to Reproduce:

          1. Create ARO cluster
          2. Find Private IP address of one of the worker nodes
          3.  Create "chaos" network security group that blocks 
      With rules: 
      Inbound Deny with 0.0.0.0 Port * to Destination: Ip address of worker node with port *, any protocol
      Outbound Deny with Ip address of worker node with port * with destination of 0.0.0.0 Port * , any protocol
      4. Find the virtual network for the set of nodes, set the worker-subnet security group to the chaos security group just created 
          

      Actual results:

      All nodes stay ready through chaos blocking communication. Can't perform oc debug or get pods running on it
      
      
       % oc debug node/prubenda-aro5-jg26m-worker-northcentralus-tkgn6   
      Starting pod/prubenda-aro5-jg26m-worker-northcentralus-tkgn6-debug-8gt5p ...
      To use host binaries, run `chroot /host`
      Pod IP: 10.0.2.5
      If you don't see a command prompt, try pressing enter.
          

      Expected results:

      When communication is blocked to the node, the node goes not ready 
          

      Additional info:

      % oc get nodes
      NAME                                              STATUS   ROLES                  AGE   VERSION
      prubenda-aro5-jg26m-master-0                      Ready    control-plane,master   8h    v1.28.13+2ca1a23
      prubenda-aro5-jg26m-master-1                      Ready    control-plane,master   8h    v1.28.13+2ca1a23
      prubenda-aro5-jg26m-master-2                      Ready    control-plane,master   8h    v1.28.13+2ca1a23
      prubenda-aro5-jg26m-worker-northcentralus-84znv   Ready    worker                 8h    v1.28.13+2ca1a23
      prubenda-aro5-jg26m-worker-northcentralus-tkgn6   Ready    worker                 8h    v1.28.13+2ca1a23
      prubenda-aro5-jg26m-worker-northcentralus-xszjq   Ready    worker                 8h    v1.28.13+2ca1a23
      
      
      // test connection from master to worker
       % oc debug node/prubenda-aro5-jg26m-master-0 
      Starting pod/prubenda-aro5-jg26m-master-0-debug-c8clg ...
      To use host binaries, run `chroot /host`
      chroot /hos
      Pod IP: 10.0.0.9
      If you don't see a command prompt, try pressing enter.
      sh-4.4# chroot /host
      sh-5.1# ping 10.0.2.5.  // try communicating with blocked worker ip 
      PING 10.0.2.5 (10.0.2.5) 56(84) bytes of data.
      ^C
      --- 10.0.2.5 ping statistics ---
      25 packets transmitted, 0 received, 100% packet loss, time 24563ms
      
      sh-5.1# ^C
      sh-5.1# ping 10.0.2.4 // communication to a second worker 
      PING 10.0.2.4 (10.0.2.4) 56(84) bytes of data.
      64 bytes from 10.0.2.4: icmp_seq=1 ttl=64 time=2.12 ms
      64 bytes from 10.0.2.4: icmp_seq=2 ttl=64 time=1.28 ms
      64 bytes from 10.0.2.4: icmp_seq=3 ttl=64 time=1.15 ms
      64 bytes from 10.0.2.4: icmp_seq=4 ttl=64 time=1.15 ms
      ^C
      --- 10.0.2.4 ping statistics ---
      4 packets transmitted, 4 received, 0% packet loss, time 3004ms
      rtt min/avg/max/mdev = 1.152/1.425/2.119/0.403 ms
      sh-5.1# exit
      exit
      sh-4.4# exit
      exit
      
      Removing debug pod ...
      
      // test connection from worker node to worker node
       % oc debug node/prubenda-aro5-jg26m-worker-northcentralus-84znv
      Starting pod/prubenda-aro5-jg26m-worker-northcentralus-84znv-debug-4z9l5 ...
      To use host binaries, run `chroot /host`
      chroot /host 
      Pod IP: 10.0.2.6
      If you don't see a command prompt, try pressing enter.
      sh-4.4# chroot /host 
      sh-5.1# ping 10.0.2.4. // ping to other worker node
      PING 10.0.2.4 (10.0.2.4) 56(84) bytes of data.
      64 bytes from 10.0.2.4: icmp_seq=1 ttl=64 time=2.13 ms
      64 bytes from 10.0.2.4: icmp_seq=2 ttl=64 time=6.10 ms
      64 bytes from 10.0.2.4: icmp_seq=3 ttl=64 time=0.720 ms
      64 bytes from 10.0.2.4: icmp_seq=4 ttl=64 time=1.64 ms
      ^C
      --- 10.0.2.4 ping statistics ---
      4 packets transmitted, 4 received, 0% packet loss, time 3020ms
      rtt min/avg/max/mdev = 0.720/2.647/6.100/2.056 ms
      sh-5.1# ^C
      sh-5.1# 10.0.2.5
      sh: 10.0.2.5: command not found
      sh-5.1# ping 10.0.2.5
      PING 10.0.2.5 (10.0.2.5) 56(84) bytes of data.
      ^C
      --- 10.0.2.5 ping statistics ---
      6 packets transmitted, 0 received, 100% packet loss, time 5111ms
      
      sh-5.1# ^C
      
      // Get nodes during chaos
       % oc get nodes
      NAME                                              STATUS   ROLES                  AGE   VERSION
      prubenda-aro5-jg26m-master-0                      Ready    control-plane,master   8h    v1.28.13+2ca1a23
      prubenda-aro5-jg26m-master-1                      Ready    control-plane,master   8h    v1.28.13+2ca1a23
      prubenda-aro5-jg26m-master-2                      Ready    control-plane,master   8h    v1.28.13+2ca1a23
      prubenda-aro5-jg26m-worker-northcentralus-84znv   Ready    worker                 8h    v1.28.13+2ca1a23
      prubenda-aro5-jg26m-worker-northcentralus-tkgn6   Ready    worker                 8h    v1.28.13+2ca1a23
      prubenda-aro5-jg26m-worker-northcentralus-xszjq   Ready    worker                 8h    v1.28.13+2ca1a23
          

              rh-ee-ngopalak Neeraj Krishna Gopalakrishna
              prubenda Paige Patton
              None
              None
              Min Li Min Li
              None
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

                Created:
                Updated:
                Resolved: