Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-74980

Router Pods scheduling on same HCP nodes when infrastructureAvailabilityPolicy set to SingleReplica

    • None
    • False
    • Hide

      None

      Show
      None
    • None
    • Low
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Environment

      • OpenShift Hosted Control Plane (HCP)
      • Platform: OpenShift 4.x (observed on HCP clusters)
      • Ingress Operator: default
      • Cluster topology: SingleReplica infrastructure

       

      Description

      When creating a Hosted Control Plane (HCP) cluster with infrastructureAvailabilityPolicy set to SingleReplica, the ingress router pods are repeatedly scheduled onto the same HCP node(s).

      The Ingress Operator detects this as a violation of expected placement / availability guarantees and continuously evicts the router pods. This results in an ongoing cycle of pod eviction and rescheduling, preventing router pods from stabilizing and causing ingress availability issues.

      The behavior appears to be caused by a mismatch between:

      • SingleReplica availability policies, and
      • Ingress Operator’s expectations around router pod placement and anti-affinity.

      NOTE: All logs, configuration snippets, and observations shared in this issue are collected from a test cluster.

      No customer data, production data, or sensitive information is included.

      Steps to Reproduce

      • Configure the management/hub cluster with latest ACM, MCE and Virtualization operator.

       

      ❯ oc get clusterversion
      ─╯
      NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
      version   4.20.10   True        False         64m     Cluster version is 4.20.10
      
      ❯ oc get csv -n multicluster-engine
      ─╯
      NAME                          DISPLAY                              VERSION   REPLACES                      PHASE
      multicluster-engine.v2.10.1   multicluster engine for Kubernetes   2.10.1    multicluster-engine.v2.10.0   Succeeded
      
      ❯ oc get csv -n open-cluster-management
      ─╯
      NAME                                  DISPLAY                                      VERSION   REPLACES                              PHASE
      advanced-cluster-management.v2.15.1   Advanced Cluster Management for Kubernetes   2.15.1    advanced-cluster-management.v2.15.0   Succeeded
      
      ❯ oc get csv -n openshift-cnv
      ─╯
      NAME                                        DISPLAY                    VERSION   REPLACES   PHASE
      kubevirt-hyperconverged-operator.v4.18.28   OpenShift Virtualization   4.18.28              Succeeded 

       

      • Deploy Hosted Cluster by configuring infrastructureAvailabilityPolicy set to SingleReplica.

       

      ❯ oc get HostedCluster -n clusters
      ─╯
      NAME         VERSION   KUBECONFIG                    PROGRESS    AVAILABLE   PROGRESSING   MESSAGE
      aygarg-hcp   4.20.10   aygarg-hcp-admin-kubeconfig   Completed   True        False         The hosted control plane is available
      
      ❯ oc get HostedCluster aygarg-hcp -n clusters -oyaml
      ─╯
      apiVersion: hypershift.openshift.io/v1beta1
      kind: HostedCluster
      ...
      spec:
        autoscaling:
          scaling: ScaleUpAndScaleDown
        capabilities: {}
        clusterID: 8a090b1e-e86e-4dbc-aaf8-60dd29e0f02a
        controllerAvailabilityPolicy: SingleReplica
        dns:
          baseDomain: apps.aygarg.emea.aws.cee.support
        etcd:
          managed:
            storage:
              persistentVolume:
                size: 8Gi
                storageClassName: gp3-csi
              type: PersistentVolume
          managementType: Managed
        fips: false
        infraID: aygarg-hcp
        infrastructureAvailabilityPolicy: SingleReplica    <<<<<<<<<<
        issuerURL: https://kubernetes.default.svc
        networking:
          clusterNetwork:
          - cidr: 10.132.0.0/14
          networkType: OVNKubernetes
          serviceNetwork:
          - cidr: 172.31.0.0/16
        olmCatalogPlacement: management
        platform:
          kubevirt:
            baseDomainPassthrough: true
            generateID: bl2d25t8qw
          type: KubeVirt
        pullSecret:
          name: pullsecret-cluster-aygarg-hcp
        release:
          image: quay.io/openshift-release-dev/ocp-release:4.20.10-multi
        secretEncryption:
          aescbc:
            activeKey:
              name: aygarg-hcp-etcd-encryption-key
          type: aescbc
        services:
        - service: APIServer
          servicePublishingStrategy:
            type: LoadBalancer
        - service: Ignition
          servicePublishingStrategy:
            type: Route
        - service: Konnectivity
          servicePublishingStrategy:
            type: Route
        - service: OAuthServer
          servicePublishingStrategy:
            type: Route
        sshKey:
          name: sshkey-cluster-aygarg-hcp
      
      ❯ oc get nodepool -n clusters
      ─╯
      NAME                   CLUSTER      DESIRED NODES   CURRENT NODES   AUTOSCALING   AUTOREPAIR   VERSION   UPDATINGVERSION   UPDATINGCONFIG   MESSAGE
      hypershift-node-pool   aygarg-hcp   2               2               False         False        4.20.12   False             False
      
      ❯ oc get nodepool hypershift-node-pool -n clusters -oyaml
      ─╯
      apiVersion: hypershift.openshift.io/v1beta1
      kind: NodePool
      ...
      spec:
        arch: amd64
        clusterName: aygarg-hcp
        management:
          autoRepair: false
          replace:
            rollingUpdate:
              maxSurge: 1
              maxUnavailable: 0
            strategy: RollingUpdate
          upgradeType: Replace
        platform:
          kubevirt:
            attachDefaultNetwork: true
            compute:
              cores: 2
              memory: 8Gi
              qosClass: Burstable
            networkInterfaceMultiqueue: Enable
            rootVolume:
              persistent:
                size: 32Gi
              type: Persistent
          type: KubeVirt
        release:
          image: quay.io/openshift-release-dev/ocp-release:4.20.12-multi
        replicas: 2

       

       

      • Increase the replica count to 2 for ingresscontroller and the pods will keep on scheduling to same HCP nodes and as expected the Ingress operator continuously evicts those.

       

      > oc get co
      NAME                                       VERSION   AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
      console                                    4.20.10   True        False         False      33m     
      csi-snapshot-controller                    4.20.10   True        False         False      46m     
      dns                                        4.20.10   True        False         False      33m     
      image-registry                             4.20.10   True        False         False      34m     
      ingress                                    4.20.10   False       True          False      0s      The "default" ingress controller reports Available=False: IngressControllerUnavailable: One or more status conditions indicate unavailable: DeploymentAvailable=False (DeploymentUnavailable: The deployment has Available status condition set to False (reason: MinimumReplicasUnavailable) with message: Deployment does not have minimum availability.)
      insights                                   4.20.10   True        False         False      35m     
      kube-apiserver                             4.20.10   True        False         False      46m     
      kube-controller-manager                    4.20.10   True        False         False      46m     
      kube-scheduler                             4.20.10   True        False         False      46m     
      kube-storage-version-migrator              4.20.10   True        False         False      34m     
      monitoring                                 4.20.10   True        False         False      31m     
      network                                    4.20.10   True        False         False      37m     
      node-tuning                                4.20.10   True        False         False      39m     
      openshift-apiserver                        4.20.10   True        False         False      46m     
      openshift-controller-manager               4.20.10   True        False         False      46m     
      openshift-samples                          4.20.10   True        False         False      33m     
      operator-lifecycle-manager                 4.20.10   True        False         False      46m     
      operator-lifecycle-manager-catalog         4.20.10   True        False         False      46m     
      operator-lifecycle-manager-packageserver   4.20.10   True        False         False      46m     
      service-ca                                 4.20.10   True        False         False      35m     
      storage                                    4.20.10   True        False         False      46m
      
      
      > oc get nodes
      NAME                               STATUS   ROLES    AGE   VERSION
      hypershift-node-pool-lwq5v-fv9vt   Ready    worker   39m   v1.33.6
      hypershift-node-pool-lwq5v-xtpss   Ready    worker   40m   v1.33.6
      
      
      > oc get ingresscontroller default -oyaml -n openshift-ingress-operator
      apiVersion: operator.openshift.io/v1
      kind: IngressController
      ...
      spec:
        clientTLS:
          clientCA:
            name: ""
          clientCertificatePolicy: ""
        closedClientConnectionPolicy: Continue
        defaultCertificate:
          name: default-ingress-cert
        domain: apps.aygarg-hcp.apps.aygarg.emea.aws.cee.support
        endpointPublishingStrategy:
          type: NodePortService
        httpCompression: {}
        httpEmptyRequestsPolicy: Respond
        httpErrorCodePages:
          name: ""
        idleConnectionTerminationPolicy: Immediate
        replicas: 2
        tuningOptions:
          reloadInterval: 0s
        unsupportedConfigOverrides: null
      ...
      status:
      ...
        domain: apps.aygarg-hcp.apps.aygarg.emea.aws.cee.support
        endpointPublishingStrategy:
          nodePort:
            protocol: TCP
          type: NodePortService
        observedGeneration: 2
        selector: ingresscontroller.operator.openshift.io/deployment-ingresscontroller=default
      
      
      > oc get pod -o wide -n openshift-ingress
      NAME                             READY   STATUS        RESTARTS   AGE     IP            NODE                               NOMINATED NODE   READINESS GATES
      router-default-8645f4f9c-8jm7r   1/1     Running       0          16s     10.132.0.69   hypershift-node-pool-lwq5v-xtpss   <none>           <none>
      router-default-8645f4f9c-8skwv   1/1     Running       0          79s     10.132.0.68   hypershift-node-pool-lwq5v-xtpss   <none>           <none>
      router-default-8645f4f9c-phshp   1/1     Terminating   0          2m16s   10.132.0.67   hypershift-node-pool-lwq5v-xtpss   <none>           <none>
      
      
      ❯ oc get pod -n clusters-aygarg-hcp | grep -ingressss
      ─╯
      ingress-operator-cd9b7fd67-m5hlt                       2/2     Running     0             48m
      
      
      ❯ oc -n clusters-aygarg-hcp logs ingress-operator-cd9b7fd67-m5hlt | grep -i "MalscheduledPod"
      ─╯
      Defaulted container "ingress-operator" out of: ingress-operator, konnectivity-proxy-https, availability-prober (init)
      I0203 07:34:02.976034       1 event.go:377] Event(v1.ObjectReference{Kind:"Deployment", Namespace:"openshift-ingress-operator", Name:"ingress-operator", UID:"", APIVersion:"apps/v1", ResourceVersion:"", FieldPath:""}): type: 'Warning' reason: 'MalscheduledPod' pod/router-default-8645f4f9c-62jzv pod/router-default-8645f4f9c-n2kzl should be one per node, but all were placed on node/hypershift-node-pool-lwq5v-xtpss; evicting pod/router-default-8645f4f9c-n2kzl
      I0203 07:34:02.994476       1 event.go:377] Event(v1.ObjectReference{Kind:"Deployment", Namespace:"openshift-ingress-operator", Name:"ingress-operator", UID:"", APIVersion:"apps/v1", ResourceVersion:"", FieldPath:""}): type: 'Warning' reason: 'MalscheduledPod' pod/router-default-8645f4f9c-62jzv pod/router-default-8645f4f9c-n2kzl should be one per node, but all were placed on node/hypershift-node-pool-lwq5v-xtpss; evicting pod/router-default-8645f4f9c-n2kzl
      
      
      ❯ oc get pod -n clusters-aygarg-hcp | grep -i scheduler
      ─╯
      kube-scheduler-7dd4db6645-6v2jh                        1/1     Running     0             49m❯ oc -n clusters-aygarg-hcp logs kube-scheduler-7dd4db6645-6v2jh | grep -i router | grep -i "hypershift-node-pool-lwq5v-fv9vt"
      
      
      ❯ oc -n clusters-aygarg-hcp logs kube-scheduler-7dd4db6645-6v2jh | grep -i router | grep -i "hypershift-node-pool-lwq5v-xtpss"
      ─╯
      Defaulted container "kube-scheduler" out of: kube-scheduler, availability-prober (init)
      I0203 07:28:26.169136       1 schedule_one.go:314] "Successfully bound pod to node" pod="openshift-ingress/router-default-8645f4f9c-n2kzl" node="hypershift-node-pool-lwq5v-xtpss" evaluatedNodes=2 feasibleNodes=2
      I0203 07:32:39.936527       1 schedule_one.go:314] "Successfully bound pod to node" pod="openshift-ingress/router-default-8645f4f9c-62jzv" node="hypershift-node-pool-lwq5v-xtpss" evaluatedNodes=2 feasibleNodes=2
      I0203 07:34:03.017518       1 schedule_one.go:314] "Successfully bound pod to node" pod="openshift-ingress/router-default-8645f4f9c-wvfzr" node="hypershift-node-pool-lwq5v-xtpss" evaluatedNodes=2 feasibleNodes=2 

       

       

      • This is happening due to the reason the missing podAffinity from the router deployment.

       

      > oc get deployment router-default -n openshift-ingress -oyaml
      apiVersion: apps/v1
      kind: Deployment
      ...
      spec:
        minReadySeconds: 30
        progressDeadlineSeconds: 600
        replicas: 2
        revisionHistoryLimit: 10
        selector:
          matchLabels:
            ingresscontroller.operator.openshift.io/deployment-ingresscontroller: default
        strategy:
          rollingUpdate:
            maxSurge: 25%
            maxUnavailable: 25%
          type: RollingUpdate
        template:
          metadata:
            annotations:
              openshift.io/required-scc: restricted
              target.workload.openshift.io/management: '{"effect": "PreferredDuringScheduling"}'
            creationTimestamp: null
            labels:
              ingresscontroller.operator.openshift.io/deployment-ingresscontroller: default
              ingresscontroller.operator.openshift.io/hash: 5bf989cbb6
          spec:
            affinity:
              nodeAffinity:
                requiredDuringSchedulingIgnoredDuringExecution:
                  nodeSelectorTerms:
                  - matchExpressions:
                    - key: node.openshift.io/remote-worker
                      operator: NotIn
                      values:
                      - ""
            containers:
      .......
            tolerations:
            - effect: NoExecute
              key: kubernetes.io/e2e-evict-taint-key
              operator: Equal
              value: evictTaintVal
            topologySpreadConstraints:
            - labelSelector:
                matchExpressions:
                - key: ingresscontroller.operator.openshift.io/hash
                  operator: In
                  values:
                  - 5bf989cbb6
              maxSkew: 1
              topologyKey: topology.kubernetes.io/zone
              whenUnsatisfiable: ScheduleAnyway 

       

       

       

              kmajcher@redhat.com Krzysztof Majcher
              rhn-support-aygarg Ayush Garg
              Yu Li Yu Li
              None
              Votes:
              1 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated: