Loading...

XML

Word

Printable

Type: Bug
Resolution: Duplicate
Priority: Undefined
Fix Version/s: None
Affects Version/s: 4.14.z
Component/s: Management Console
Labels:
None

Activity Type:
Quality / Stability / Reliability
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Story Points:
None
Severity:
None
Regression:
No

Target Backport Versions:
None
Target Version:
None
Release Blocker:
None
Sprint:
None

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Release Note Status:
None
Release Note Type:
None
Release Note Text:
None

Escape Reason:
None
Escape Impact:
None
Corrective Measures:
None
SDLC stage when should've been found:
None

I am using OCP deployed on AWS:

$ oc version
Client Version: 4.14.3
Kustomize Version: v5.0.1
Server Version: 4.14.11
Kubernetes Version: v1.27.10+28ed2d7

My cluster has a single master node and multiple worker nodes:

$ oc get node
NAME                                        STATUS   ROLES                  AGE   VERSION
ip-10-0-17-251.us-west-2.compute.internal   Ready    worker                 10h   v1.27.10+28ed2d7
ip-10-0-19-110.us-west-2.compute.internal   Ready    control-plane,master   10h   v1.27.10+28ed2d7
ip-10-0-47-136.us-west-2.compute.internal   Ready    worker                 10h   v1.27.10+28ed2d7
ip-10-0-66-68.us-west-2.compute.internal    Ready    worker                 10h   v1.27.10+28ed2d7
ip-10-0-81-138.us-west-2.compute.internal   Ready    worker                 9h    v1.27.10+28ed2d7

Accordingly, in the Infrastructure object, the fields describing the cluster topology are controlPlaneTopology: SingleReplica and infrastructureTopology: HighlyAvailable:

$ oc get infrastructures.config.openshift.io cluster -o yaml
apiVersion: config.openshift.io/v1
kind: Infrastructure
metadata:
  ...
  name: cluster
  ...
spec:
  cloudConfig:
    key: config
    name: cloud-provider-config
  platformSpec:
    aws: {}
    type: AWS
status:
  apiServerInternalURI: https://api-int.mycluster10a.sandbox1452.opentlc.com:6443
  apiServerURL: https://api.mycluster10a.sandbox1452.opentlc.com:6443
  controlPlaneTopology: SingleReplica
  cpuPartitioning: None
  etcdDiscoveryDomain: ""
  infrastructureName: mycluster10a-ddr8m
  infrastructureTopology: HighlyAvailable
  platform: AWS
  platformStatus:
    aws:
      region: us-west-2
    type: AWS

The console operator schedules three console replicas:

$ oc get po -n openshift-console
NAME                         READY   STATUS    RESTARTS   AGE
console-58cc755947-ld47m     0/1     Pending   0          9h
console-58cc755947-s9skd     0/1     Pending   0          9h
console-69cc447fd4-b5nmt     1/1     Running   2          10h
downloads-5545fcd8f7-8mflq   1/1     Running   2          10h
downloads-5545fcd8f7-gg2k4   1/1     Running   2          10h

Two out of three scheduled replicas remain pending forever. The console pod node selector is configured to place the console pods on master nodes and there is an anti-affinity that prevents placing more than one console pod on the same node:

$ oc get deploy -n openshift-console console -o yaml
...
   spec:
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchExpressions:
              - key: component
                operator: In
                values:
                - ui
            topologyKey: kubernetes.io/hostname
...
      nodeSelector:
        node-role.kubernetes.io/master: ""
...

As my cluster has a single master node, it's not possible to place all three console pods. As a result, the console operator cannot finish reconciling:

$ oc get co
NAME                                       VERSION   AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
authentication                             4.14.11   True        False         False      9h
baremetal                                  4.14.11   True        False         False      10h
cloud-controller-manager                   4.14.11   True        False         False      10h
cloud-credential                           4.14.11   True        False         False      10h
cluster-autoscaler                         4.14.11   True        False         False      10h
config-operator                            4.14.11   True        False         False      10h
console                                    4.14.11   True        True          False      9h      SyncLoopRefreshProgressing: Working toward version 4.14.11, 1 replicas available
control-plane-machine-set                  4.14.11   True        False         False      10h
csi-snapshot-controller                    4.14.11   True        False         False      10h
dns                                        4.14.11   True        False         False      10h
etcd                                       4.14.11   True        False         False      10h
...

It looks like the console operator consults the Infrastructure.status.infrastructureTopology field that is set to HighlyAvailable. Based on this information, the console operator creates three console replicas.

The console operator should perhaps also check the field Infrastructure.status.controlPlaneTopology. This field is set to SingleReplica indicating that it's not possible to schedule three console replicas on master nodes.

Assignee:: Jakub Hadvig

Reporter:: Ales Nosek

QA Contact:: YaDan Pei

Need Info From:: None

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Created:: 2024/03/28 2:17 PM

Updated:: 2025/07/23 5:46 AM

Resolved:: 2024/04/02 10:13 AM

Details

Description

Attachments

Easy Agile Planning Poker

Activity

People

Dates