-
Bug
-
Resolution: Not a Bug
-
Normal
-
None
-
4.12.z
-
Moderate
-
No
-
False
-
-
Customer Escalated, Customer Facing
-
-
-
Likely a firewall configuration issue; nodes hosting egressip's not reachable
-
-
-
Description of problem:
The CNC contoller removed all the egress IPs from the nodes. After deleting the EgressIP objects and recreating them. Only the nodes for a single availablilty zone were created.
Version-Release number of selected component (if applicable):
Core controler
How reproducible:
Not very reproducible, but we do have an existing non-PROD and PROD exhibiting this behaviour.
Steps to Reproduce:
1. N/A 2. 3.
Actual results:
All EgressIp's apreared to be removed from the nodes and after dlelting and re-creating the EgressIP objects only the IPs for one availability zone recovered.
Expected results:
Egress IP's remain perminantly on the allocated nodes until the EgressIP object is removed or then node is unavailable.
Additional info:
There is a numer of excessive egressIP/CloudPrivateIPConfig events logged to the CNCC controller in the pas 18 days aprox 96,000 log enteries.
Looking at the OCP audit logs, there does not apear to be any entries for any messages calling the API for example Put "https://api-int.uat-rosa.80g0.p1.openshiftapps.com:6443/apis/cloud.network.openshift.io/v1/cloudprivateipconfigs/10.134.17.151/status": context deadline exceeded, requeuing in cloud-private-ip-config workqueue grep "cloudprivateipconfigs" audit-prod-rosa-x6fzw-2024.02.10T08.00_0800-2024.02.10T08.00_0800.log.txt | wc -l 0 Note the date of the Audit logs does not overlap with the current CNCC logs, but the issue did exist back when the Audit logs were taken
I am opening this tonight, but we have also just thought of another reason when this might be occuring. There is a firewall involved and so we are wondering if cross AZ traffic for the ROSA node EC2 instances is routed via the managed firewall. As they had a catastrophic Firewall failure, which required revoering without backup Please do not put to much effort into this today as we will check this with the customer tomorrow APAC time.