Customer running a cluster with following config:
Observed that in namespace with networkpolicy rules enabled, and a policy for allow-from-same namespace, pods will have different behaviors when calling service IP's hosted in that same namespace.
Deployment1 with two pods (A/B) exists in namespace <EXAMPLE>
Deployment2 with 1 pod hosting a service and route exists in same namespace
Pod A will unexpectedly stop being able to call service IP of deployment2; Pod B will never lose access to calling service IP of deployment2.
Pod A remains able to call out through br-ex interface, tag the ROUTE address, and reach deployment2 pod via haproxy (this never breaks)
Pod A remains able to reach the local gateway on the node
Host node for Pod A is able to reach the service IP of deployment2 and remains able to do so, even while pod A is impacted.
Issue can be mitigated by applying a label or annotation to pod A, which immediately allows it to reach internal service IPs again within the namespace.
I suspect that the issue is to do with the networkpolicy rules failing to stay updated on the pod object, and the pod needs to be 'refreshed' --> label appendation/other update, to force the pod to 'remember' that it is allowed to call peers within the namespace.
Additional relevant data:
- pods affects throughout cluster; no specific project/service/deployment/application
- pods ride on different nodes all the time (no one node affected)
- pods with fail condition are on same node with other pods without issue
- multiple namespaces see this problem
- all namespaces are using similar networkpolicy isolation and allow-from-same-namespace ruleset (which matches our documentation on syntax).