Uploaded image for project: 'Red Hat Advanced Cluster Management'
  1. Red Hat Advanced Cluster Management
  2. ACM-2549

Submariner-addon pod crashlooping (OOMKilled) during provisioning of 3000+ managed SNOs

XMLWordPrintable

    • Submariner Sprint 2024-22, Submariner Sprint 2024-23, Submariner Sprint 2024-24, Submariner Sprint 2024-25, Submariner Sprint 2024-26, Submariner Sprint 2024-27
    • Moderate
    • No

      Description of problem:

      While deploying 3000+ SNOs with ACM and ZTP, the submariner-addon pod was crashlooping at the conclusion of the test. Reviewing when it started crashlooping it actually appears that it started when the first cluster finished provisioning and became managed. (Perhaps there is something wrong with what that pod is examining since in order to achieve more than 3000 managed SNOs, SNOs are provisioned in 500 cluster "steps")

       

      Version-Release number of selected component (if applicable):

      2.7.0-DOWNSTREAM-2023-01-03-01-19-39
      OCP 4.11.19 Hub and SNOs

      How reproducible:

      Steps to Reproduce:

      1.  
      2.  
      3. ...

      Actual results:

      Expected results:

      Additional info:

      # oc get po -n open-cluster-management -l app=submariner-addon
      NAME                                READY   STATUS             RESTARTS         AGE
      submariner-addon-7cdfb67b4c-c9hsw   0/1     CrashLoopBackOff   168 (4m5s ago)   22h
      # oc describe po -n open-cluster-management -l app=submariner-addon
      Name:         submariner-addon-7cdfb67b4c-c9hsw
      Namespace:    open-cluster-management
      Priority:     0
      Node:         e27-h03-000-r650/fc00:1002::6
      Start Time:   Tue, 03 Jan 2023 21:31:51 +0000
      Labels:       app=submariner-addon
                    pod-template-hash=7cdfb67b4c
      Annotations:  alm-examples:
                      [{"apiVersion": "operator.open-cluster-management.io/v1", "kind": "MultiClusterHub", "metadata": {"name": "multiclusterhub", "namespace": ...
                    capabilities: Seamless Upgrades
                    categories: Integration & Delivery
                    certified: true
                    createdAt: 2023-01-03T15:47:42Z
                    description: Advanced provisioning and management of OpenShift and Kubernetes clusters
                    k8s.ovn.org/pod-networks:
                      {"default":{"ip_addresses":["fd01:0:0:2::47/64"],"mac_address":"0a:58:4b:8a:e9:86","gateway_ips":["fd01:0:0:2::1"],"ip_address":"fd01:0:0:...
                    k8s.v1.cni.cncf.io/network-status:
                      [{
                          "name": "ovn-kubernetes",
                          "interface": "eth0",
                          "ips": [
                              "fd01:0:0:2::47"
                          ],
                          "mac": "0a:58:4b:8a:e9:86",
                          "default": true,
                          "dns": {}
                      }]
                    k8s.v1.cni.cncf.io/networks-status:
                      [{
                          "name": "ovn-kubernetes",
                          "interface": "eth0",
                          "ips": [
                              "fd01:0:0:2::47"
                          ],
                          "mac": "0a:58:4b:8a:e9:86",
                          "default": true,
                          "dns": {}
                      }]
                    olm.operatorGroup: default
                    olm.operatorNamespace: open-cluster-management
                    olm.skipRange: >=2.6.0 <2.7.0
                    olm.targetNamespaces: open-cluster-management
                    openshift.io/scc: restricted-v2
                    operatorframework.io/initialization-resource:
                      {"apiVersion":"operator.open-cluster-management.io/v1", "kind":"MultiClusterHub","metadata":{"name":"multiclusterhub","namespace":"open-cl...
                    operatorframework.io/properties:
                      {"properties":[{"type":"olm.gvk","value":{"group":"submarineraddon.open-cluster-management.io","kind":"SubmarinerConfig","version":"v1alph...
                    operatorframework.io/suggested-namespace: open-cluster-management
                    operators.openshift.io/infrastructure-features: ["disconnected", "proxy-aware", "fips"]
                    operators.openshift.io/valid-subscription: ["OpenShift Platform Plus", "Red Hat Advanced Cluster Management for Kubernetes"]
                    operators.operatorframework.io/internal-objects:
                      ["observatoria.core.observatorium.io", "observabilityaddons.observability.open-cluster-management.io"]
                    seccomp.security.alpha.kubernetes.io/pod: runtime/default
                    support: Red Hat
      Status:       Running
      IP:           fd01:0:0:2::47
      IPs:
        IP:           fd01:0:0:2::47
      Controlled By:  ReplicaSet/submariner-addon-7cdfb67b4c
      Containers:
        submariner-addon:
          Container ID:  cri-o://dee8846c89430625616784b9912504c61b77ba11da49a4578ab181eb9dcf70d7
          Image:         registry.redhat.io/rhacm2/submariner-addon-rhel8@sha256:79826d86770432e3e548f6400ba2251b5787258028fda68d4c308c0d27ae9a44
          Image ID:      registry.redhat.io/rhacm2/submariner-addon-rhel8@sha256:79826d86770432e3e548f6400ba2251b5787258028fda68d4c308c0d27ae9a44
          Port:          <none>
          Host Port:     <none>
          Args:
            /submariner
            controller
          State:          Waiting
            Reason:       CrashLoopBackOff
          Last State:     Terminated
            Reason:       OOMKilled
            Exit Code:    137
            Started:      Wed, 04 Jan 2023 20:11:48 +0000
            Finished:     Wed, 04 Jan 2023 20:14:25 +0000
          Ready:          False
          Restart Count:  168
          Limits:
            memory:  270Mi
          Requests:
            cpu:      100m
            memory:   128Mi
          Liveness:   http-get https://:8443/healthz delay=2s timeout=1s period=10s #success=1 #failure=3
          Readiness:  http-get https://:8443/healthz delay=2s timeout=1s period=10s #success=1 #failure=3
          Environment:
            POD_NAME:                 submariner-addon-7cdfb67b4c-c9hsw (v1:metadata.name)
            OPERATOR_CONDITION_NAME:  advanced-cluster-management.v2.7.0
          Mounts:
            /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-rf5b8 (ro)
      Conditions:
        Type              Status
        Initialized       True
        Ready             False
        ContainersReady   False
        PodScheduled      True
      Volumes:
        kube-api-access-rf5b8:
          Type:                    Projected (a volume that contains injected data from multiple sources)
          TokenExpirationSeconds:  3607
          ConfigMapName:           kube-root-ca.crt
          ConfigMapOptional:       <nil>
          DownwardAPI:             true
          ConfigMapName:           openshift-service-ca.crt
          ConfigMapOptional:       <nil>
      QoS Class:                   Burstable
      Node-Selectors:              <none>
      Tolerations:                 node.kubernetes.io/memory-pressure:NoSchedule op=Exists
                                   node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                                   node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
      Events:
        Type     Reason     Age                     From     Message
        ----     ------     ----                    ----     -------
        Warning  Unhealthy  167m                    kubelet  Liveness probe failed: Get "https://[fd01:0:0:2::47]:8443/healthz": EOF
        Warning  Unhealthy  105m (x8 over 15h)      kubelet  Liveness probe failed: Get "https://[fd01:0:0:2::47]:8443/healthz": dial tcp [fd01:0:0:2::47]:8443: connect: connection refused
        Warning  BackOff    2m43s (x4110 over 21h)  kubelet  Back-off restarting failed container

              rhn-support-jchhatba Janki Chhatbar
              akrzos@redhat.com Alex Krzos
              Maxim Babushkin Maxim Babushkin
              Votes:
              0 Vote for this issue
              Watchers:
              12 Start watching this issue

                Created:
                Updated:
                Resolved: