-
Bug
-
Resolution: Done
-
Major
-
ACM 2.10.0, ACM 2.9.0, ACM 2.8.0, ACM 2.7.0
-
False
-
None
-
False
-
-
-
Submariner Sprint 2024-22, Submariner Sprint 2024-23, Submariner Sprint 2024-24, Submariner Sprint 2024-25, Submariner Sprint 2024-26, Submariner Sprint 2024-27
-
Moderate
-
No
Description of problem:
While deploying 3000+ SNOs with ACM and ZTP, the submariner-addon pod was crashlooping at the conclusion of the test. Reviewing when it started crashlooping it actually appears that it started when the first cluster finished provisioning and became managed. (Perhaps there is something wrong with what that pod is examining since in order to achieve more than 3000 managed SNOs, SNOs are provisioned in 500 cluster "steps")
Version-Release number of selected component (if applicable):
2.7.0-DOWNSTREAM-2023-01-03-01-19-39
OCP 4.11.19 Hub and SNOs
How reproducible:
Steps to Reproduce:
- ...
Actual results:
Expected results:
Additional info:
# oc get po -n open-cluster-management -l app=submariner-addon NAME READY STATUS RESTARTS AGE submariner-addon-7cdfb67b4c-c9hsw 0/1 CrashLoopBackOff 168 (4m5s ago) 22h # oc describe po -n open-cluster-management -l app=submariner-addon Name: submariner-addon-7cdfb67b4c-c9hsw Namespace: open-cluster-management Priority: 0 Node: e27-h03-000-r650/fc00:1002::6 Start Time: Tue, 03 Jan 2023 21:31:51 +0000 Labels: app=submariner-addon pod-template-hash=7cdfb67b4c Annotations: alm-examples: [{"apiVersion": "operator.open-cluster-management.io/v1", "kind": "MultiClusterHub", "metadata": {"name": "multiclusterhub", "namespace": ... capabilities: Seamless Upgrades categories: Integration & Delivery certified: true createdAt: 2023-01-03T15:47:42Z description: Advanced provisioning and management of OpenShift and Kubernetes clusters k8s.ovn.org/pod-networks: {"default":{"ip_addresses":["fd01:0:0:2::47/64"],"mac_address":"0a:58:4b:8a:e9:86","gateway_ips":["fd01:0:0:2::1"],"ip_address":"fd01:0:0:... k8s.v1.cni.cncf.io/network-status: [{ "name": "ovn-kubernetes", "interface": "eth0", "ips": [ "fd01:0:0:2::47" ], "mac": "0a:58:4b:8a:e9:86", "default": true, "dns": {} }] k8s.v1.cni.cncf.io/networks-status: [{ "name": "ovn-kubernetes", "interface": "eth0", "ips": [ "fd01:0:0:2::47" ], "mac": "0a:58:4b:8a:e9:86", "default": true, "dns": {} }] olm.operatorGroup: default olm.operatorNamespace: open-cluster-management olm.skipRange: >=2.6.0 <2.7.0 olm.targetNamespaces: open-cluster-management openshift.io/scc: restricted-v2 operatorframework.io/initialization-resource: {"apiVersion":"operator.open-cluster-management.io/v1", "kind":"MultiClusterHub","metadata":{"name":"multiclusterhub","namespace":"open-cl... operatorframework.io/properties: {"properties":[{"type":"olm.gvk","value":{"group":"submarineraddon.open-cluster-management.io","kind":"SubmarinerConfig","version":"v1alph... operatorframework.io/suggested-namespace: open-cluster-management operators.openshift.io/infrastructure-features: ["disconnected", "proxy-aware", "fips"] operators.openshift.io/valid-subscription: ["OpenShift Platform Plus", "Red Hat Advanced Cluster Management for Kubernetes"] operators.operatorframework.io/internal-objects: ["observatoria.core.observatorium.io", "observabilityaddons.observability.open-cluster-management.io"] seccomp.security.alpha.kubernetes.io/pod: runtime/default support: Red Hat Status: Running IP: fd01:0:0:2::47 IPs: IP: fd01:0:0:2::47 Controlled By: ReplicaSet/submariner-addon-7cdfb67b4c Containers: submariner-addon: Container ID: cri-o://dee8846c89430625616784b9912504c61b77ba11da49a4578ab181eb9dcf70d7 Image: registry.redhat.io/rhacm2/submariner-addon-rhel8@sha256:79826d86770432e3e548f6400ba2251b5787258028fda68d4c308c0d27ae9a44 Image ID: registry.redhat.io/rhacm2/submariner-addon-rhel8@sha256:79826d86770432e3e548f6400ba2251b5787258028fda68d4c308c0d27ae9a44 Port: <none> Host Port: <none> Args: /submariner controller State: Waiting Reason: CrashLoopBackOff Last State: Terminated Reason: OOMKilled Exit Code: 137 Started: Wed, 04 Jan 2023 20:11:48 +0000 Finished: Wed, 04 Jan 2023 20:14:25 +0000 Ready: False Restart Count: 168 Limits: memory: 270Mi Requests: cpu: 100m memory: 128Mi Liveness: http-get https://:8443/healthz delay=2s timeout=1s period=10s #success=1 #failure=3 Readiness: http-get https://:8443/healthz delay=2s timeout=1s period=10s #success=1 #failure=3 Environment: POD_NAME: submariner-addon-7cdfb67b4c-c9hsw (v1:metadata.name) OPERATOR_CONDITION_NAME: advanced-cluster-management.v2.7.0 Mounts: /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-rf5b8 (ro) Conditions: Type Status Initialized True Ready False ContainersReady False PodScheduled True Volumes: kube-api-access-rf5b8: Type: Projected (a volume that contains injected data from multiple sources) TokenExpirationSeconds: 3607 ConfigMapName: kube-root-ca.crt ConfigMapOptional: <nil> DownwardAPI: true ConfigMapName: openshift-service-ca.crt ConfigMapOptional: <nil> QoS Class: Burstable Node-Selectors: <none> Tolerations: node.kubernetes.io/memory-pressure:NoSchedule op=Exists node.kubernetes.io/not-ready:NoExecute op=Exists for 300s node.kubernetes.io/unreachable:NoExecute op=Exists for 300s Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning Unhealthy 167m kubelet Liveness probe failed: Get "https://[fd01:0:0:2::47]:8443/healthz": EOF Warning Unhealthy 105m (x8 over 15h) kubelet Liveness probe failed: Get "https://[fd01:0:0:2::47]:8443/healthz": dial tcp [fd01:0:0:2::47]:8443: connect: connection refused Warning BackOff 2m43s (x4110 over 21h) kubelet Back-off restarting failed container
- is related to
-
ACM-2850 Upgrading from ACM 2.6 to 2.7 failed
- Closed