Loading...

XML

Word

Printable

Type: Bug
Resolution: Unresolved
Priority: Critical
Fix Version/s: None
Affects Version/s: 4.16
Component/s: Networking / ovn-kubernetes
Labels:
- OVN-Kubernetes
- hostPrefix

Activity Type:
Incidents & Support
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Story Points:
None
Original story points:
5
Severity:
Critical
Regression:
None

Target Backport Versions:
None
Target Version:
None
Release Blocker:
None
Sprint:
CORENET Sprint 284
sprint_count:
1

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

PX Impact Score:

Release Note Status:
None
Release Note Type:
None
Release Note Text:
None

Escape Reason:
None
Escape Impact:
None
Corrective Measures:
None
SDLC stage when should've been found:
None

Description of problem:

IHAC who have new bare metal cluster, and immediately after migration, they experienced problems with pod scheduling. Not all pods were able to start at all.

After investigation, found that there is a limit set by IP range that each node has the ability to host 510 pods in its own IP range. Customer now have large bare metal workers, and over 6000 pods altogether. After migration, with the 510 pod limit, the bare metal workers could not handle all the pod workload.

So, the customer aims to increase this limit to accommodate around 6000 pods, necessitating a larger address space per node. However, the core problem is the inability to change the hostPrefix after cluster installation, which is not supported as a Day-2 operation. The customer is currently on OpenShift Container Platform (OCP) version 4.16 and planning to upgrade to v4.18.

There was a KCS[1] regarding increasing the pod network and changing the host prefix, and the customer has requested and received a Support Exception SUPPORTEX-29444[2], to change the hostPrefix from 23 to 21. This adjustment will allow for 2046 pods per node. This support exception has also been approved by the PM and Engineering.

[1] KCS: https://access.redhat.com/solutions/6456731

[2] Support Exception: https://issues.redhat.com/browse/SUPPORTEX-29444

The issue comes up now after they began testing this procedure, as mentioned in the KCS. The customer began testing the procedure in a LAB-cluster but encountered issues during the node draining step, causing loss of cluster connection and API server access. High CPU usage alerts were observed in VMware when the cluster was unresponsive, resolved by forcing restarts of etcd-nodes.

We have captured the SOS report from the node, which was drained for review, and need assistance from engineering to suggest the next steps and execute this change successfully.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1. Execute the steps mentioned in this private KCS on a baremetal vSphere cluster : https://access.redhat.com/solutions/6456731

Actual results:

1. During the node draining step, causing loss of cluster connection and API server access.

2. High CPU usage alerts were observed in VMware when the cluster was unresponsive, resolved by forcing restarts of etcd-nodes

Expected results:

The hostPrefix should be changed successfully without breaking the cluster.

Additional info:

Please fill in the following template while reporting a bug and provide as much relevant information as possible. Doing so will give us the best chance to find a prompt resolution.

Affected Platforms: Baremetal cluster on vSphere

Assignee:: Sachin Ninganure

Reporter:: Mridul Markandey

QA Contact:: Anurag Saxena

Need Info From:: None

Votes:: 0 Vote for this issue

Watchers:: 7 Start watching this issue

Created:: 2026/02/11 3:02 AM

Updated:: 2026/02/16 11:30 AM

Details

Description

Attachments

Easy Agile Planning Poker

Activity

People

Dates