-
Bug
-
Resolution: Unresolved
-
Undefined
-
None
-
None
-
False
-
-
False
-
-
-
No
Description of problem:
The argocd-pull-integration-controller-manager container is OOM crashlooping while managing 1246 clusters. Clusters are a mix of SNO, compact and standard "types".
Version-Release number of selected component (if applicable):
ACM Hub - 4.12.10
ACM - 2.8.0-DOWNSTREAM-2023-04-04-01-46-55
Managed cluster OCP 4.12.10
How reproducible:
Steps to Reproduce:
- ...
Actual results:
Expected results:
Additional info:
# oc get po -n open-cluster-management multicluster-integrations-75d6547fdf-q7mcv
NAME READY STATUS RESTARTS AGE
multicluster-integrations-75d6547fdf-q7mcv 2/3 CrashLoopBackOff 180 (36s ago) 16h
# oc describe po -n open-cluster-management multicluster-integrations-75d6547fdf-q7mcv
Name: multicluster-integrations-75d6547fdf-q7mcv
Namespace: open-cluster-management
Priority: 0
Service Account: multicluster-applications
Node: e27-h03-000-r650/fc00:1004::6
Start Time: Tue, 04 Apr 2023 22:34:14 +0000
Labels: name=multicluster-integrations
ocm-antiaffinity-selector=multicluster-integrations
pod-template-hash=75d6547fdf
Annotations: k8s.ovn.org/pod-networks:
{"default":{"ip_addresses":["fd01:0:0:3::23/64"],"mac_address":"0a:58:10:b9:61:b7","gateway_ips":["fd01:0:0:3::1"],"ip_address":"fd01:0:0:...
k8s.v1.cni.cncf.io/network-status:
[{
"name": "ovn-kubernetes",
"interface": "eth0",
"ips": [
"fd01:0:0:3::23"
],
"mac": "0a:58:10:b9:61:b7",
"default": true,
"dns": {}
}]
k8s.v1.cni.cncf.io/networks-status:
[{
"name": "ovn-kubernetes",
"interface": "eth0",
"ips": [
"fd01:0:0:3::23"
],
"mac": "0a:58:10:b9:61:b7",
"default": true,
"dns": {}
}]
openshift.io/scc: restricted-v2
seccomp.security.alpha.kubernetes.io/pod: runtime/default
Status: Running
IP: fd01:0:0:3::23
IPs:
IP: fd01:0:0:3::23
Controlled By: ReplicaSet/multicluster-integrations-75d6547fdf
Containers:
argocd-pull-integration-controller-manager:
Container ID: cri-o://4813c2b6afcdc4ca547effd30504158853edd162d1b36b4504d83d0eb1b95452
Image: e27-h01-000-r650.rdu2.scalelab.redhat.com:5000/acm-d/multicloud-integrations-rhel8@sha256:9227443a4a57c432f48301019af17691c9070778210a3f22425bd4d8f85bcc29
Image ID: e27-h01-000-r650.rdu2.scalelab.redhat.com:5000/acm-d/multicloud-integrations-rhel8@sha256:59c8d4fc46b89117e69a255ea987380c111d551e3086be179ea80fb783a12101
Port: <none>
Host Port: <none>
Command:
/usr/local/bin/propagation
--leader-election-lease-duration=137
--renew-deadline=107
--retry-period=26
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: OOMKilled
Exit Code: 137
Started: Wed, 05 Apr 2023 14:39:38 +0000
Finished: Wed, 05 Apr 2023 14:39:52 +0000
Ready: False
Restart Count: 171
Limits:
cpu: 500m
memory: 128Mi
Requests:
cpu: 10m
memory: 64Mi
Liveness: exec [ls] delay=15s timeout=1s period=15s #success=1 #failure=3
Readiness: exec [ls] delay=15s timeout=1s period=15s #success=1 #failure=3
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-vp7gz (ro)
multicluster-integrations-syncresource:
Container ID: cri-o://e28ab4d1ed0c48674527c534393812bb0ca23015db4cf3be9d8fced963201ff5
Image: e27-h01-000-r650.rdu2.scalelab.redhat.com:5000/acm-d/multicloud-integrations-rhel8@sha256:9227443a4a57c432f48301019af17691c9070778210a3f22425bd4d8f85bcc29
Image ID: e27-h01-000-r650.rdu2.scalelab.redhat.com:5000/acm-d/multicloud-integrations-rhel8@sha256:59c8d4fc46b89117e69a255ea987380c111d551e3086be179ea80fb783a12101
Port: <none>
Host Port: <none>
Command:
/usr/local/bin/gitopssyncresc
--appset-resource-dir=/etc/gitops-resources
--sync-interval=10
--leader-election-lease-duration=137
--renew-deadline=107
--retry-period=26
State: Running
Started: Wed, 05 Apr 2023 04:59:55 +0000
Last State: Terminated
Reason: Error
Exit Code: 2
Started: Wed, 05 Apr 2023 03:59:15 +0000
Finished: Wed, 05 Apr 2023 04:59:54 +0000
Ready: True
Restart Count: 9
Limits:
cpu: 100m
memory: 512Mi
Requests:
cpu: 25m
memory: 64Mi
Liveness: exec [ls] delay=15s timeout=1s period=15s #success=1 #failure=3
Readiness: exec [ls] delay=15s timeout=1s period=15s #success=1 #failure=3
Environment:
WATCH_NAMESPACE:
POD_NAME: multicluster-integrations-75d6547fdf-q7mcv (v1:metadata.name)
POD_NAMESPACE: open-cluster-management (v1:metadata.namespace)
DEPLOYMENT_LABEL: multicluster-integrations-syncresource
OPERATOR_NAME: multicluster-integrations
Mounts:
/etc/gitops-resources from multicluster-integrations-syncresource (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-vp7gz (ro)
multicluster-integrations-aggregation:
Container ID: cri-o://45b7dc5c4abf8ede454d4c27221ef00836c6eade20ef037e465a887f7b5123f4
Image: e27-h01-000-r650.rdu2.scalelab.redhat.com:5000/acm-d/multicloud-integrations-rhel8@sha256:9227443a4a57c432f48301019af17691c9070778210a3f22425bd4d8f85bcc29
Image ID: e27-h01-000-r650.rdu2.scalelab.redhat.com:5000/acm-d/multicloud-integrations-rhel8@sha256:59c8d4fc46b89117e69a255ea987380c111d551e3086be179ea80fb783a12101
Port: <none>
Host Port: <none>
Command:
/usr/local/bin/multiclusterstatusaggregation
--appset-resource-dir=/etc/gitops-resources
--sync-interval=10
--leader-election-lease-duration=137
--renew-deadline=107
--retry-period=26
State: Running
Started: Tue, 04 Apr 2023 22:34:20 +0000
Ready: True
Restart Count: 0
Limits:
cpu: 100m
memory: 512Mi
Requests:
cpu: 25m
memory: 64Mi
Liveness: exec [ls] delay=15s timeout=1s period=15s #success=1 #failure=3
Readiness: exec [ls] delay=15s timeout=1s period=15s #success=1 #failure=3
Environment:
WATCH_NAMESPACE:
POD_NAME: multicluster-integrations-75d6547fdf-q7mcv (v1:metadata.name)
POD_NAMESPACE: open-cluster-management (v1:metadata.namespace)
DEPLOYMENT_LABEL: multicluster-integrations-aggregation
OPERATOR_NAME: multicluster-integrations
Mounts:
/etc/gitops-resources from multicluster-integrations-syncresource (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-vp7gz (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
multicluster-integrations-syncresource:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
kube-api-access-vp7gz:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
ConfigMapName: openshift-service-ca.crt
ConfigMapOptional: <nil>
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node-role.kubernetes.io/infra:NoSchedule op=Exists
node.kubernetes.io/memory-pressure:NoSchedule op=Exists
node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Pulled 86m (x155 over 16h) kubelet Container image "e27-h01-000-r650.rdu2.scalelab.redhat.com:5000/acm-d/multicloud-integrations-rhel8@sha256:9227443a4a57c432f48301019af17691c9070778210a3f22425bd4d8f85bcc29" already present on machine
Warning BackOff 81s (x4028 over 14h) kubelet Back-off restarting failed container # oc logs -n open-cluster-management multicluster-integrations-75d6547fdf-q7mcv -c argocd-pull-integration-controller-manager --timestamps -p
2023-04-05T14:39:39.769643277Z I0405 14:39:39.769471 1 request.go:690] Waited for 1.040518232s due to client-side throttling, not priority and fairness, request: GET:https://[fd02::1]:443/apis/monitoring.coreos.com/v1?timeout=32s
2023-04-05T14:39:43.444673741Z 1.6807055834445755e+09 INFO controller-runtime.metrics Metrics server is starting to listen {"addr": "0.0.0.0:8386"}
2023-04-05T14:39:43.467355681Z 1.6807055834672885e+09 INFO setup found CRD applications.argoproj.io
2023-04-05T14:39:43.467438162Z 1.6807055834674232e+09 INFO setup starting manager
2023-04-05T14:39:43.467702957Z 1.6807055834676733e+09 INFO Starting server {"path": "/metrics", "kind": "metrics", "addr": "[::]:8386"}
2023-04-05T14:39:43.467821086Z 1.680705583467794e+09 INFO Starting EventSource {"controller": "application", "controllerGroup": "argoproj.io", "controllerKind": "Application", "source": "kind source: *v1alpha1.Application"}
2023-04-05T14:39:43.467826679Z 1.6807055834678226e+09 INFO Starting Controller {"controller": "application", "controllerGroup": "argoproj.io", "controllerKind": "Application"}
2023-04-05T14:39:43.467860255Z 1.6807055834678254e+09 INFO Starting EventSource {"controller": "manifestwork", "controllerGroup": "work.open-cluster-management.io", "controllerKind": "ManifestWork", "source": "kind source: *v1.ManifestWork"}
2023-04-05T14:39:43.467865046Z 1.6807055834678595e+09 INFO Starting Controller {"controller": "manifestwork", "controllerGroup": "work.open-cluster-management.io", "controllerKind": "ManifestWork"}