-
Bug
-
Resolution: Duplicate
-
Undefined
-
None
-
4.14.z
-
None
-
Quality / Stability / Reliability
-
False
-
-
None
-
None
-
No
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
I am using OCP deployed on AWS:
$ oc version Client Version: 4.14.3 Kustomize Version: v5.0.1 Server Version: 4.14.11 Kubernetes Version: v1.27.10+28ed2d7
My cluster has a single master node and multiple worker nodes:
$ oc get node NAME STATUS ROLES AGE VERSION ip-10-0-17-251.us-west-2.compute.internal Ready worker 10h v1.27.10+28ed2d7 ip-10-0-19-110.us-west-2.compute.internal Ready control-plane,master 10h v1.27.10+28ed2d7 ip-10-0-47-136.us-west-2.compute.internal Ready worker 10h v1.27.10+28ed2d7 ip-10-0-66-68.us-west-2.compute.internal Ready worker 10h v1.27.10+28ed2d7 ip-10-0-81-138.us-west-2.compute.internal Ready worker 9h v1.27.10+28ed2d7
Accordingly, in the Infrastructure object, the fields describing the cluster topology are controlPlaneTopology: SingleReplica and infrastructureTopology: HighlyAvailable:
$ oc get infrastructures.config.openshift.io cluster -o yaml
apiVersion: config.openshift.io/v1
kind: Infrastructure
metadata:
...
name: cluster
...
spec:
cloudConfig:
key: config
name: cloud-provider-config
platformSpec:
aws: {}
type: AWS
status:
apiServerInternalURI: https://api-int.mycluster10a.sandbox1452.opentlc.com:6443
apiServerURL: https://api.mycluster10a.sandbox1452.opentlc.com:6443
controlPlaneTopology: SingleReplica
cpuPartitioning: None
etcdDiscoveryDomain: ""
infrastructureName: mycluster10a-ddr8m
infrastructureTopology: HighlyAvailable
platform: AWS
platformStatus:
aws:
region: us-west-2
type: AWS
The console operator schedules three console replicas:
$ oc get po -n openshift-console NAME READY STATUS RESTARTS AGE console-58cc755947-ld47m 0/1 Pending 0 9h console-58cc755947-s9skd 0/1 Pending 0 9h console-69cc447fd4-b5nmt 1/1 Running 2 10h downloads-5545fcd8f7-8mflq 1/1 Running 2 10h downloads-5545fcd8f7-gg2k4 1/1 Running 2 10h
Two out of three scheduled replicas remain pending forever. The console pod node selector is configured to place the console pods on master nodes and there is an anti-affinity that prevents placing more than one console pod on the same node:
$ oc get deploy -n openshift-console console -o yaml
...
spec:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: component
operator: In
values:
- ui
topologyKey: kubernetes.io/hostname
...
nodeSelector:
node-role.kubernetes.io/master: ""
...
As my cluster has a single master node, it's not possible to place all three console pods. As a result, the console operator cannot finish reconciling:
$ oc get co
NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE MESSAGE
authentication 4.14.11 True False False 9h
baremetal 4.14.11 True False False 10h
cloud-controller-manager 4.14.11 True False False 10h
cloud-credential 4.14.11 True False False 10h
cluster-autoscaler 4.14.11 True False False 10h
config-operator 4.14.11 True False False 10h
console 4.14.11 True True False 9h SyncLoopRefreshProgressing: Working toward version 4.14.11, 1 replicas available
control-plane-machine-set 4.14.11 True False False 10h
csi-snapshot-controller 4.14.11 True False False 10h
dns 4.14.11 True False False 10h
etcd 4.14.11 True False False 10h
...
It looks like the console operator consults the Infrastructure.status.infrastructureTopology field that is set to HighlyAvailable. Based on this information, the console operator creates three console replicas.
The console operator should perhaps also check the field Infrastructure.status.controlPlaneTopology. This field is set to SingleReplica indicating that it's not possible to schedule three console replicas on master nodes.