-
Bug
-
Resolution: Unresolved
-
Blocker
-
None
-
None
-
None
Description:{}
OSDFM are seeing OCM Agent pods stuck in Pending state in Stage ap-southeast-6 SC.
Symptoms:{}
- Pods: ocm-agent-operator-registry-* in openshift-ocm-agent-operator namespace
- NodeSelector on pod: node-role.kubernetes.io/infra
- Node(s) in ap-southeast-6 new SC did not have the required label
node-role.kubernetes.io/infra
initially.
- As a result, pods remain in Pending / FailedScheduling.
History / Attempts:{}
Other Stage/INT regions are working because their worker nodes have the label:
node-role.kubernetes.io/infra=""
Last week, I tried adding:
ext-node-role.kubernetes.io/infra=true
in OSDFM deploy.yml
- This allowed pods to schedule temporarily, likely as a workaround.
- Note: Latest Terraform seems to not allow node-role.kubernetes.io/infra="", it may require some non-empty value.https://redhat-internal.slack.com/archives/C035W96HKN3/p1763742427268339
Current cause:{}
- Pod nodeSelector strictly requires the key node-role.kubernetes.io/infra.
- ap-southeast-6 nodes were missing this key, hence pods could not schedule.
Action requested:{}
Investigate the reason and confirm that all nodes in ap-southeast-6 that should run OCM agent pods have the correct label like the other region:
node-role.kubernetes.io/infra="" or true
If cannot figure out why ap-southeast-6 nodes were missing this key, OSDFM still needs to ensure SC creation in stage and prod ap-southeast-6 before the 11/30/2025 new region enable deadline, then SRE can manually add this label to ensure at least one usable SC + MC exists in this region.