-
Bug
-
Resolution: Not a Bug
-
Normal
-
None
-
4.15
-
Quality / Stability / Reliability
-
False
-
-
None
-
None
-
None
-
None
-
None
-
None
-
OCP Node Sprint 272 (Blue)
-
1
-
None
-
None
-
None
-
None
-
None
-
None
-
None
Description of problem:
During a chaotic scenario on an ARO cluster where we block inbound and outbound traffic to a node, the worker node never reports notready status. Master nodes aren't able to ping/make a connection to the worker node and no status changes
Version-Release number of selected component (if applicable):
4.15.35
How reproducible:
100%
Steps to Reproduce:
1. Create ARO cluster
2. Find Private IP address of one of the worker nodes
3. Create "chaos" network security group that blocks
With rules:
Inbound Deny with 0.0.0.0 Port * to Destination: Ip address of worker node with port *, any protocol
Outbound Deny with Ip address of worker node with port * with destination of 0.0.0.0 Port * , any protocol
4. Find the virtual network for the set of nodes, set the worker-subnet security group to the chaos security group just created
Actual results:
All nodes stay ready through chaos blocking communication. Can't perform oc debug or get pods running on it
% oc debug node/prubenda-aro5-jg26m-worker-northcentralus-tkgn6
Starting pod/prubenda-aro5-jg26m-worker-northcentralus-tkgn6-debug-8gt5p ...
To use host binaries, run `chroot /host`
Pod IP: 10.0.2.5
If you don't see a command prompt, try pressing enter.
Expected results:
When communication is blocked to the node, the node goes not ready
Additional info:
% oc get nodes
NAME STATUS ROLES AGE VERSION
prubenda-aro5-jg26m-master-0 Ready control-plane,master 8h v1.28.13+2ca1a23
prubenda-aro5-jg26m-master-1 Ready control-plane,master 8h v1.28.13+2ca1a23
prubenda-aro5-jg26m-master-2 Ready control-plane,master 8h v1.28.13+2ca1a23
prubenda-aro5-jg26m-worker-northcentralus-84znv Ready worker 8h v1.28.13+2ca1a23
prubenda-aro5-jg26m-worker-northcentralus-tkgn6 Ready worker 8h v1.28.13+2ca1a23
prubenda-aro5-jg26m-worker-northcentralus-xszjq Ready worker 8h v1.28.13+2ca1a23
// test connection from master to worker
% oc debug node/prubenda-aro5-jg26m-master-0
Starting pod/prubenda-aro5-jg26m-master-0-debug-c8clg ...
To use host binaries, run `chroot /host`
chroot /hos
Pod IP: 10.0.0.9
If you don't see a command prompt, try pressing enter.
sh-4.4# chroot /host
sh-5.1# ping 10.0.2.5. // try communicating with blocked worker ip
PING 10.0.2.5 (10.0.2.5) 56(84) bytes of data.
^C
--- 10.0.2.5 ping statistics ---
25 packets transmitted, 0 received, 100% packet loss, time 24563ms
sh-5.1# ^C
sh-5.1# ping 10.0.2.4 // communication to a second worker
PING 10.0.2.4 (10.0.2.4) 56(84) bytes of data.
64 bytes from 10.0.2.4: icmp_seq=1 ttl=64 time=2.12 ms
64 bytes from 10.0.2.4: icmp_seq=2 ttl=64 time=1.28 ms
64 bytes from 10.0.2.4: icmp_seq=3 ttl=64 time=1.15 ms
64 bytes from 10.0.2.4: icmp_seq=4 ttl=64 time=1.15 ms
^C
--- 10.0.2.4 ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3004ms
rtt min/avg/max/mdev = 1.152/1.425/2.119/0.403 ms
sh-5.1# exit
exit
sh-4.4# exit
exit
Removing debug pod ...
// test connection from worker node to worker node
% oc debug node/prubenda-aro5-jg26m-worker-northcentralus-84znv
Starting pod/prubenda-aro5-jg26m-worker-northcentralus-84znv-debug-4z9l5 ...
To use host binaries, run `chroot /host`
chroot /host
Pod IP: 10.0.2.6
If you don't see a command prompt, try pressing enter.
sh-4.4# chroot /host
sh-5.1# ping 10.0.2.4. // ping to other worker node
PING 10.0.2.4 (10.0.2.4) 56(84) bytes of data.
64 bytes from 10.0.2.4: icmp_seq=1 ttl=64 time=2.13 ms
64 bytes from 10.0.2.4: icmp_seq=2 ttl=64 time=6.10 ms
64 bytes from 10.0.2.4: icmp_seq=3 ttl=64 time=0.720 ms
64 bytes from 10.0.2.4: icmp_seq=4 ttl=64 time=1.64 ms
^C
--- 10.0.2.4 ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3020ms
rtt min/avg/max/mdev = 0.720/2.647/6.100/2.056 ms
sh-5.1# ^C
sh-5.1# 10.0.2.5
sh: 10.0.2.5: command not found
sh-5.1# ping 10.0.2.5
PING 10.0.2.5 (10.0.2.5) 56(84) bytes of data.
^C
--- 10.0.2.5 ping statistics ---
6 packets transmitted, 0 received, 100% packet loss, time 5111ms
sh-5.1# ^C
// Get nodes during chaos
% oc get nodes
NAME STATUS ROLES AGE VERSION
prubenda-aro5-jg26m-master-0 Ready control-plane,master 8h v1.28.13+2ca1a23
prubenda-aro5-jg26m-master-1 Ready control-plane,master 8h v1.28.13+2ca1a23
prubenda-aro5-jg26m-master-2 Ready control-plane,master 8h v1.28.13+2ca1a23
prubenda-aro5-jg26m-worker-northcentralus-84znv Ready worker 8h v1.28.13+2ca1a23
prubenda-aro5-jg26m-worker-northcentralus-tkgn6 Ready worker 8h v1.28.13+2ca1a23
prubenda-aro5-jg26m-worker-northcentralus-xszjq Ready worker 8h v1.28.13+2ca1a23