-
Bug
-
Resolution: Done
-
Normal
-
ACM 2.8.0
-
False
-
-
False
-
-
-
Important
-
No
Description of problem:
After deploying and managing 3600+ SNOs, the multicluster-integrations-syncresource container in the multicluster-integrations pod is crashlooping due to search-api pod OOMing. The pod should be able to handle the separate pod OOMing without crashlooping.
# oc get po -n open-cluster-management multicluster-integrations-6c5747667c-rz47r NAME READY STATUS RESTARTS AGE multicluster-integrations-6c5747667c-rz47r 3/3 Running 45 (31m ago) 18h
Version-Release number of selected component (if applicable):
OCP Hub 4.12.14
Deployed SNOs - 4.13.0-rc.6
ACM - 2.8.0-DOWNSTREAM-2023-04-30-18-44-29
How reproducible:
Steps to Reproduce:
- ...
Actual results:
Expected results:
Additional info:
Describe of pod:
oc describe po -n open-cluster-management multicluster-integrations-6c5747667c-rz47r
Name: multicluster-integrations-6c5747667c-rz47r
Namespace: open-cluster-management
Priority: 0
Service Account: multicluster-applications
Node: e27-h03-000-r650/fc00:1004::6
Start Time: Wed, 03 May 2023 20:35:57 +0000
Labels: name=multicluster-integrations
ocm-antiaffinity-selector=multicluster-integrations
pod-template-hash=6c5747667c
Annotations: k8s.ovn.org/pod-networks:
{"default":{"ip_addresses":["fd01:0:0:3::2b/64"],"mac_address":"0a:58:01:19:04:0e","gateway_ips":["fd01:0:0:3::1"],"ip_address":"fd01:0:0:...
k8s.v1.cni.cncf.io/network-status:
[{
"name": "ovn-kubernetes",
"interface": "eth0",
"ips": [
"fd01:0:0:3::2b"
],
"mac": "0a:58:01:19:04:0e",
"default": true,
"dns": {}
}]
k8s.v1.cni.cncf.io/networks-status:
[{
"name": "ovn-kubernetes",
"interface": "eth0",
"ips": [
"fd01:0:0:3::2b"
],
"mac": "0a:58:01:19:04:0e",
"default": true,
"dns": {}
}]
openshift.io/scc: restricted-v2
seccomp.security.alpha.kubernetes.io/pod: runtime/default
Status: Running
IP: fd01:0:0:3::2b
IPs:
IP: fd01:0:0:3::2b
Controlled By: ReplicaSet/multicluster-integrations-6c5747667c
Containers:
argocd-pull-integration-controller-manager:
Container ID: cri-o://6ea482f2be109e6ac1e91413749afeb4433a7b40e288f6227b129efb88060a36
Image: e27-h01-000-r650.rdu2.scalelab.redhat.com:5000/acm-d/multicloud-integrations-rhel8@sha256:c34c3540c77f04f806bd97dee808346979169038eb3dfb585933f7380cd4e0c0
Image ID: e27-h01-000-r650.rdu2.scalelab.redhat.com:5000/acm-d/multicloud-integrations-rhel8@sha256:c34c3540c77f04f806bd97dee808346979169038eb3dfb585933f7380cd4e0c0
Port: <none>
Host Port: <none>
Command:
/usr/local/bin/propagation
--leader-election-lease-duration=137s
--leader-election-renew-deadline=107s
--leader-election-retry-period=26s
State: Running
Started: Wed, 03 May 2023 20:36:02 +0000
Ready: True
Restart Count: 0
Limits:
cpu: 100m
memory: 512Mi
Requests:
cpu: 25m
memory: 64Mi
Liveness: exec [ls] delay=15s timeout=1s period=15s #success=1 #failure=3
Readiness: exec [ls] delay=15s timeout=1s period=15s #success=1 #failure=3
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-xk77k (ro)
multicluster-integrations-syncresource:
Container ID: cri-o://c32bc580b028caa7e54a1a39e1f17d4a17f56441589bf1188ecb3b6e468afcda
Image: e27-h01-000-r650.rdu2.scalelab.redhat.com:5000/acm-d/multicloud-integrations-rhel8@sha256:c34c3540c77f04f806bd97dee808346979169038eb3dfb585933f7380cd4e0c0
Image ID: e27-h01-000-r650.rdu2.scalelab.redhat.com:5000/acm-d/multicloud-integrations-rhel8@sha256:c34c3540c77f04f806bd97dee808346979169038eb3dfb585933f7380cd4e0c0
Port: <none>
Host Port: <none>
Command:
/usr/local/bin/gitopssyncresc
--appset-resource-dir=/etc/gitops-resources
--sync-interval=10
--leader-election-lease-duration=137s
--leader-election-renew-deadline=107s
--leader-election-retry-period=26s
State: Running
Started: Thu, 04 May 2023 14:43:00 +0000
Last State: Terminated
Reason: Error
Exit Code: 2
Started: Thu, 04 May 2023 14:18:18 +0000
Finished: Thu, 04 May 2023 14:42:59 +0000
Ready: True
Restart Count: 45
Limits:
cpu: 100m
memory: 512Mi
Requests:
cpu: 25m
memory: 64Mi
Liveness: exec [ls] delay=15s timeout=1s period=15s #success=1 #failure=3
Readiness: exec [ls] delay=15s timeout=1s period=15s #success=1 #failure=3
Environment:
WATCH_NAMESPACE:
POD_NAME: multicluster-integrations-6c5747667c-rz47r (v1:metadata.name)
POD_NAMESPACE: open-cluster-management (v1:metadata.namespace)
DEPLOYMENT_LABEL: multicluster-integrations-syncresource
OPERATOR_NAME: multicluster-integrations
Mounts:
/etc/gitops-resources from multicluster-integrations-syncresource (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-xk77k (ro)
multicluster-integrations-aggregation:
Container ID: cri-o://f3e8a783300ee9938c117df377b50daaccc530926fd4986b65f9ac015e0621e9
Image: e27-h01-000-r650.rdu2.scalelab.redhat.com:5000/acm-d/multicloud-integrations-rhel8@sha256:c34c3540c77f04f806bd97dee808346979169038eb3dfb585933f7380cd4e0c0
Image ID: e27-h01-000-r650.rdu2.scalelab.redhat.com:5000/acm-d/multicloud-integrations-rhel8@sha256:c34c3540c77f04f806bd97dee808346979169038eb3dfb585933f7380cd4e0c0
Port: <none>
Host Port: <none>
Command:
/usr/local/bin/multiclusterstatusaggregation
--appset-resource-dir=/etc/gitops-resources
--sync-interval=10
--leader-election-lease-duration=137s
--leader-election-renew-deadline=107s
--leader-election-retry-period=26s
State: Running
Started: Wed, 03 May 2023 20:36:03 +0000
Ready: True
Restart Count: 0
Limits:
cpu: 100m
memory: 1Gi
Requests:
cpu: 25m
memory: 64Mi
Liveness: exec [ls] delay=15s timeout=1s period=15s #success=1 #failure=3
Readiness: exec [ls] delay=15s timeout=1s period=15s #success=1 #failure=3
Environment:
WATCH_NAMESPACE:
POD_NAME: multicluster-integrations-6c5747667c-rz47r (v1:metadata.name)
POD_NAMESPACE: open-cluster-management (v1:metadata.namespace)
DEPLOYMENT_LABEL: multicluster-integrations-aggregation
OPERATOR_NAME: multicluster-integrations
Mounts:
/etc/gitops-resources from multicluster-integrations-syncresource (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-xk77k (ro)
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
multicluster-integrations-syncresource:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
kube-api-access-xk77k:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
ConfigMapName: openshift-service-ca.crt
ConfigMapOptional: <nil>
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node-role.kubernetes.io/infra:NoSchedule op=Exists
node.kubernetes.io/memory-pressure:NoSchedule op=Exists
node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning Unhealthy 126m (x2 over 14h) kubelet Readiness probe failed:
Warning Unhealthy 109m (x3 over 14h) kubelet Liveness probe failed:
Normal Pulled 31m (x46 over 18h) kubelet Container image "e27-h01-000-r650.rdu2.scalelab.redhat.com:5000/acm-d/multicloud-integrations-rhel8@sha256:c34c3540c77f04f806bd97dee808346979169038eb3dfb585933f7380cd4e0c0" already present on machine
Normal Created 31m (x46 over 18h) kubelet Created container multicluster-integrations-syncresource
Normal Started 31m (x46 over 18h) kubelet Started container multicluster-integrations-syncresource