Loading...

Type: Bug
Resolution: Unresolved
Priority: Normal
Fix Version/s: None
Affects Version/s: 4.20.z
Component/s: HyperShift / OCP Virtualization
Labels:

Activity Type:
None
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Story Points:
None
Severity:
Low
Regression:
None

Target Backport Versions:
None
Target Version:
None
Release Blocker:
None
Sprint:
None

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

PX Impact Score:

Release Note Status:
None
Release Note Type:
None
Release Note Text:
None

Escape Reason:
None
Escape Impact:
None
Corrective Measures:
None
SDLC stage when should've been found:
None

Environment

OpenShift Hosted Control Plane (HCP)

Platform: OpenShift 4.x (observed on HCP clusters)

Ingress Operator: default

Cluster topology: SingleReplica infrastructure

Description

When creating a Hosted Control Plane (HCP) cluster with infrastructureAvailabilityPolicy set to SingleReplica, the ingress router pods are repeatedly scheduled onto the same HCP node(s).

The Ingress Operator detects this as a violation of expected placement / availability guarantees and continuously evicts the router pods. This results in an ongoing cycle of pod eviction and rescheduling, preventing router pods from stabilizing and causing ingress availability issues.

The behavior appears to be caused by a mismatch between:

SingleReplica availability policies, and

Ingress Operator’s expectations around router pod placement and anti-affinity.

NOTE: All logs, configuration snippets, and observations shared in this issue are collected from a test cluster.

No customer data, production data, or sensitive information is included.

Steps to Reproduce

Configure the management/hub cluster with latest ACM, MCE and Virtualization operator.

❯ oc get clusterversion
─╯
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.20.10   True        False         64m     Cluster version is 4.20.10

❯ oc get csv -n multicluster-engine
─╯
NAME                          DISPLAY                              VERSION   REPLACES                      PHASE
multicluster-engine.v2.10.1   multicluster engine for Kubernetes   2.10.1    multicluster-engine.v2.10.0   Succeeded

❯ oc get csv -n open-cluster-management
─╯
NAME                                  DISPLAY                                      VERSION   REPLACES                              PHASE
advanced-cluster-management.v2.15.1   Advanced Cluster Management for Kubernetes   2.15.1    advanced-cluster-management.v2.15.0   Succeeded

❯ oc get csv -n openshift-cnv
─╯
NAME                                        DISPLAY                    VERSION   REPLACES   PHASE
kubevirt-hyperconverged-operator.v4.18.28   OpenShift Virtualization   4.18.28              Succeeded

Deploy Hosted Cluster by configuring infrastructureAvailabilityPolicy set to SingleReplica.

❯ oc get HostedCluster -n clusters
─╯
NAME         VERSION   KUBECONFIG                    PROGRESS    AVAILABLE   PROGRESSING   MESSAGE
aygarg-hcp   4.20.10   aygarg-hcp-admin-kubeconfig   Completed   True        False         The hosted control plane is available

❯ oc get HostedCluster aygarg-hcp -n clusters -oyaml
─╯
apiVersion: hypershift.openshift.io/v1beta1
kind: HostedCluster
...
spec:
  autoscaling:
    scaling: ScaleUpAndScaleDown
  capabilities: {}
  clusterID: 8a090b1e-e86e-4dbc-aaf8-60dd29e0f02a
  controllerAvailabilityPolicy: SingleReplica
  dns:
    baseDomain: apps.aygarg.emea.aws.cee.support
  etcd:
    managed:
      storage:
        persistentVolume:
          size: 8Gi
          storageClassName: gp3-csi
        type: PersistentVolume
    managementType: Managed
  fips: false
  infraID: aygarg-hcp
  infrastructureAvailabilityPolicy: SingleReplica    <<<<<<<<<<
  issuerURL: https://kubernetes.default.svc
  networking:
    clusterNetwork:
    - cidr: 10.132.0.0/14
    networkType: OVNKubernetes
    serviceNetwork:
    - cidr: 172.31.0.0/16
  olmCatalogPlacement: management
  platform:
    kubevirt:
      baseDomainPassthrough: true
      generateID: bl2d25t8qw
    type: KubeVirt
  pullSecret:
    name: pullsecret-cluster-aygarg-hcp
  release:
    image: quay.io/openshift-release-dev/ocp-release:4.20.10-multi
  secretEncryption:
    aescbc:
      activeKey:
        name: aygarg-hcp-etcd-encryption-key
    type: aescbc
  services:
  - service: APIServer
    servicePublishingStrategy:
      type: LoadBalancer
  - service: Ignition
    servicePublishingStrategy:
      type: Route
  - service: Konnectivity
    servicePublishingStrategy:
      type: Route
  - service: OAuthServer
    servicePublishingStrategy:
      type: Route
  sshKey:
    name: sshkey-cluster-aygarg-hcp

❯ oc get nodepool -n clusters
─╯
NAME                   CLUSTER      DESIRED NODES   CURRENT NODES   AUTOSCALING   AUTOREPAIR   VERSION   UPDATINGVERSION   UPDATINGCONFIG   MESSAGE
hypershift-node-pool   aygarg-hcp   2               2               False         False        4.20.12   False             False

❯ oc get nodepool hypershift-node-pool -n clusters -oyaml
─╯
apiVersion: hypershift.openshift.io/v1beta1
kind: NodePool
...
spec:
  arch: amd64
  clusterName: aygarg-hcp
  management:
    autoRepair: false
    replace:
      rollingUpdate:
        maxSurge: 1
        maxUnavailable: 0
      strategy: RollingUpdate
    upgradeType: Replace
  platform:
    kubevirt:
      attachDefaultNetwork: true
      compute:
        cores: 2
        memory: 8Gi
        qosClass: Burstable
      networkInterfaceMultiqueue: Enable
      rootVolume:
        persistent:
          size: 32Gi
        type: Persistent
    type: KubeVirt
  release:
    image: quay.io/openshift-release-dev/ocp-release:4.20.12-multi
  replicas: 2

Increase the replica count to 2 for ingresscontroller and the pods will keep on scheduling to same HCP nodes and as expected the Ingress operator continuously evicts those.

> oc get co
NAME                                       VERSION   AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
console                                    4.20.10   True        False         False      33m     
csi-snapshot-controller                    4.20.10   True        False         False      46m     
dns                                        4.20.10   True        False         False      33m     
image-registry                             4.20.10   True        False         False      34m     
ingress                                    4.20.10   False       True          False      0s      The "default" ingress controller reports Available=False: IngressControllerUnavailable: One or more status conditions indicate unavailable: DeploymentAvailable=False (DeploymentUnavailable: The deployment has Available status condition set to False (reason: MinimumReplicasUnavailable) with message: Deployment does not have minimum availability.)
insights                                   4.20.10   True        False         False      35m     
kube-apiserver                             4.20.10   True        False         False      46m     
kube-controller-manager                    4.20.10   True        False         False      46m     
kube-scheduler                             4.20.10   True        False         False      46m     
kube-storage-version-migrator              4.20.10   True        False         False      34m     
monitoring                                 4.20.10   True        False         False      31m     
network                                    4.20.10   True        False         False      37m     
node-tuning                                4.20.10   True        False         False      39m     
openshift-apiserver                        4.20.10   True        False         False      46m     
openshift-controller-manager               4.20.10   True        False         False      46m     
openshift-samples                          4.20.10   True        False         False      33m     
operator-lifecycle-manager                 4.20.10   True        False         False      46m     
operator-lifecycle-manager-catalog         4.20.10   True        False         False      46m     
operator-lifecycle-manager-packageserver   4.20.10   True        False         False      46m     
service-ca                                 4.20.10   True        False         False      35m     
storage                                    4.20.10   True        False         False      46m


> oc get nodes
NAME                               STATUS   ROLES    AGE   VERSION
hypershift-node-pool-lwq5v-fv9vt   Ready    worker   39m   v1.33.6
hypershift-node-pool-lwq5v-xtpss   Ready    worker   40m   v1.33.6


> oc get ingresscontroller default -oyaml -n openshift-ingress-operator
apiVersion: operator.openshift.io/v1
kind: IngressController
...
spec:
  clientTLS:
    clientCA:
      name: ""
    clientCertificatePolicy: ""
  closedClientConnectionPolicy: Continue
  defaultCertificate:
    name: default-ingress-cert
  domain: apps.aygarg-hcp.apps.aygarg.emea.aws.cee.support
  endpointPublishingStrategy:
    type: NodePortService
  httpCompression: {}
  httpEmptyRequestsPolicy: Respond
  httpErrorCodePages:
    name: ""
  idleConnectionTerminationPolicy: Immediate
  replicas: 2
  tuningOptions:
    reloadInterval: 0s
  unsupportedConfigOverrides: null
...
status:
...
  domain: apps.aygarg-hcp.apps.aygarg.emea.aws.cee.support
  endpointPublishingStrategy:
    nodePort:
      protocol: TCP
    type: NodePortService
  observedGeneration: 2
  selector: ingresscontroller.operator.openshift.io/deployment-ingresscontroller=default


> oc get pod -o wide -n openshift-ingress
NAME                             READY   STATUS        RESTARTS   AGE     IP            NODE                               NOMINATED NODE   READINESS GATES
router-default-8645f4f9c-8jm7r   1/1     Running       0          16s     10.132.0.69   hypershift-node-pool-lwq5v-xtpss   <none>           <none>
router-default-8645f4f9c-8skwv   1/1     Running       0          79s     10.132.0.68   hypershift-node-pool-lwq5v-xtpss   <none>           <none>
router-default-8645f4f9c-phshp   1/1     Terminating   0          2m16s   10.132.0.67   hypershift-node-pool-lwq5v-xtpss   <none>           <none>


❯ oc get pod -n clusters-aygarg-hcp | grep -ingressss
─╯
ingress-operator-cd9b7fd67-m5hlt                       2/2     Running     0             48m


❯ oc -n clusters-aygarg-hcp logs ingress-operator-cd9b7fd67-m5hlt | grep -i "MalscheduledPod"
─╯
Defaulted container "ingress-operator" out of: ingress-operator, konnectivity-proxy-https, availability-prober (init)
I0203 07:34:02.976034       1 event.go:377] Event(v1.ObjectReference{Kind:"Deployment", Namespace:"openshift-ingress-operator", Name:"ingress-operator", UID:"", APIVersion:"apps/v1", ResourceVersion:"", FieldPath:""}): type: 'Warning' reason: 'MalscheduledPod' pod/router-default-8645f4f9c-62jzv pod/router-default-8645f4f9c-n2kzl should be one per node, but all were placed on node/hypershift-node-pool-lwq5v-xtpss; evicting pod/router-default-8645f4f9c-n2kzl
I0203 07:34:02.994476       1 event.go:377] Event(v1.ObjectReference{Kind:"Deployment", Namespace:"openshift-ingress-operator", Name:"ingress-operator", UID:"", APIVersion:"apps/v1", ResourceVersion:"", FieldPath:""}): type: 'Warning' reason: 'MalscheduledPod' pod/router-default-8645f4f9c-62jzv pod/router-default-8645f4f9c-n2kzl should be one per node, but all were placed on node/hypershift-node-pool-lwq5v-xtpss; evicting pod/router-default-8645f4f9c-n2kzl


❯ oc get pod -n clusters-aygarg-hcp | grep -i scheduler
─╯
kube-scheduler-7dd4db6645-6v2jh                        1/1     Running     0             49m❯ oc -n clusters-aygarg-hcp logs kube-scheduler-7dd4db6645-6v2jh | grep -i router | grep -i "hypershift-node-pool-lwq5v-fv9vt"


❯ oc -n clusters-aygarg-hcp logs kube-scheduler-7dd4db6645-6v2jh | grep -i router | grep -i "hypershift-node-pool-lwq5v-xtpss"
─╯
Defaulted container "kube-scheduler" out of: kube-scheduler, availability-prober (init)
I0203 07:28:26.169136       1 schedule_one.go:314] "Successfully bound pod to node" pod="openshift-ingress/router-default-8645f4f9c-n2kzl" node="hypershift-node-pool-lwq5v-xtpss" evaluatedNodes=2 feasibleNodes=2
I0203 07:32:39.936527       1 schedule_one.go:314] "Successfully bound pod to node" pod="openshift-ingress/router-default-8645f4f9c-62jzv" node="hypershift-node-pool-lwq5v-xtpss" evaluatedNodes=2 feasibleNodes=2
I0203 07:34:03.017518       1 schedule_one.go:314] "Successfully bound pod to node" pod="openshift-ingress/router-default-8645f4f9c-wvfzr" node="hypershift-node-pool-lwq5v-xtpss" evaluatedNodes=2 feasibleNodes=2

This is happening due to the reason the missing podAffinity from the router deployment.

> oc get deployment router-default -n openshift-ingress -oyaml
apiVersion: apps/v1
kind: Deployment
...
spec:
  minReadySeconds: 30
  progressDeadlineSeconds: 600
  replicas: 2
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      ingresscontroller.operator.openshift.io/deployment-ingresscontroller: default
  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 25%
    type: RollingUpdate
  template:
    metadata:
      annotations:
        openshift.io/required-scc: restricted
        target.workload.openshift.io/management: '{"effect": "PreferredDuringScheduling"}'
      creationTimestamp: null
      labels:
        ingresscontroller.operator.openshift.io/deployment-ingresscontroller: default
        ingresscontroller.operator.openshift.io/hash: 5bf989cbb6
    spec:
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: node.openshift.io/remote-worker
                operator: NotIn
                values:
                - ""
      containers:
.......
      tolerations:
      - effect: NoExecute
        key: kubernetes.io/e2e-evict-taint-key
        operator: Equal
        value: evictTaintVal
      topologySpreadConstraints:
      - labelSelector:
          matchExpressions:
          - key: ingresscontroller.operator.openshift.io/hash
            operator: In
            values:
            - 5bf989cbb6
        maxSkew: 1
        topologyKey: topology.kubernetes.io/zone
        whenUnsatisfiable: ScheduleAnyway

links to

Router pods scheduled to same node on HCP with SingleReplica availability policies leading to eviction loop

Details

Description

Environment

Description

NOTE: All logs, configuration snippets, and observations shared in this issue are collected from a test cluster.

Steps to Reproduce

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates