-
Bug
-
Resolution: Done
-
Critical
-
4.14.z
-
Quality / Stability / Reliability
-
False
-
-
None
-
Critical
-
No
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
Description of problem:
In hypershift power agent platform, the default ingresscontroller has a node selector to bind all of the pods to just one node. It conflicts with its topology in infra HA mode.
$ alias och alias och='oc --kubeconfig=hostedcluster.kubeconfig'
$ och get no
NAME STATUS ROLES AGE VERSION
67edb405d51bdcf0b789-worker-1 Ready worker 164m v1.27.8+4fab27b
67edb405d51bdcf0b789-worker-2 Ready worker 162m v1.27.8+4fab27b
67edb405d51bdcf0b789-worker-3 Ready worker 160m v1.27.8+4fab27b
$ och get ingresscontroller/default -n openshift-ingress-operator -ojsonpath='{.spec.nodePlacement}' | jq
{
"nodeSelector": {
"matchLabels": {
"kubernetes.io/hostname": "67edb405d51bdcf0b789-worker-1"
}
},
"tolerations": [
{
"effect": "NoSchedule",
"key": "kubernetes.io/hostname",
"operator": "Exists"
}
]
}
$ och get pod -n openshift-ingress
NAME READY STATUS RESTARTS AGE
router-default-749d74b8d6-k2pp7 0/1 Pending 0 70m
router-default-79fd468f5-h9xxb 1/1 Running 0 74m
Note:
if we delete the ingresscontroller/default in openshift-ingress-operator manually. It will be recreated again without any nodeselector. Then the pods could be scheduled successfully without that invalid nodeselector.
$ och delete ingresscontroller default -n openshift-ingress-operator
ingresscontroller.operator.openshift.io "default" deleted
$ ocget ingresscontroller -n openshift-ingress-operatorch
NAME AGE
default 2s
$ och get ingresscontroller/default -n openshift-ingress-operator -ojsonpath='{.spec.nodePlacement}'
$ och get pod -n openshift-ingress
NAME READY STATUS RESTARTS AGE
router-default-79fd468f5-6gl6r 1/1 Running 0 3m1s
router-default-79fd468f5-7s74g 1/1 Running 0 3m1s
Version-Release number of selected component (if applicable):
4.14
How reproducible:
create an aws cp + power worker nodes, or rehearsal the job in the pr: https://github.com/openshift/release/pull/44096
Steps to Reproduce:
rehearsal a job in pr https://github.com/openshift/release/pull/44096 /pj-rehearse periodic-ci-openshift-openshift-tests-private-release-4.14-amd64-nightly-aws-ipi-ovn-hypershift-mce-power-guest-critical-f7
Actual results:
ingress operator could not be ready
Expected results:
ingress operator is ready
Additional info:
- links to
-
RHEA-2024:0041
OpenShift Container Platform 4.16.z bug fix update