-
Bug
-
Resolution: Unresolved
-
Major
-
None
-
4.18
-
None
-
Quality / Stability / Reliability
-
False
-
-
None
-
None
-
None
-
None
-
None
-
Rejected
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
Please change component if incorrect area, TIA
Description of problem:
Unable to scale Azure cluster to 120 nodes without failed machines and nodes never become ready NOTE: this functionality passed on 4.16 And passed on 4.17 with a workaround documented here: https://access.redhat.com/solutions/6982343 Is this workaround a new every day change for scaling above 55 nodes? Machine events show: InvalidConfiguration: failed to reconcile machine "scaleci18-8929-xd67z-worker-centralus2-2k44r": network.InterfacesClient#CreateOrUpdate: Failure sending request: StatusCode=400 -- Original Error: Code="SpecifiedAllocatedOutboundPortsForOutboundRuleExceedsTotalNumberOfAvailablePorts" Message="Specified Allocated Outbound Ports 1024 for Outbound Rule /subscriptions/***/resourceGroups/scaleci18-8929-xd67z-rg/providers/Microsoft.Network/loadBalancers/scaleci18-8929-xd67z/outboundRules/OutboundNATAllProtocols exceeds total number of available ports per backend instance of 1008 based upon desired pool size. Reduce allocated ports or increase number of IP addresses for outbound rule." Details=[]
Version-Release number of selected component (if applicable):
4.18.0-0.nightly-2024-10-08-033830
How reproducible:
100%
Steps to Reproduce:
1. Create azure ovn cluster on 4.18.0-0.nightly-2024-10-08-033830 2. Scale all 3 zone machinesets (40 nodes on each of the 3 machinesets) Can be done in the console or below command executed on each of the 3 machinesets oc scale machinesets -n openshift-machine-api ${machineset} --replicas 40 3. Many failed machines and nodes don't become available
Actual results:
Many machines fail and nodes don't become available
Expected results:
All machines and nodes become running
Additional info:
Did not hit this issue on 4.17.0-nightly**, this is the first time hitting this issue on 4.18. Running on different regions and newer versions of 4.17 for comparison of results Must gather can be found here: https://mastern-jenkins-csb-openshift-qe.apps.ocp-c1.prod.psi.redhat.com/job/scale-ci/job/e2e-benchmarking-multibranch-pipeline/job/must-gather/594/artifact/