-
Bug
-
Resolution: Won't Do
-
Normal
-
None
-
None
-
False
-
None
-
False
-
NEW
-
NEW
-
High
-
SDA - Sprint 219
-
High (80-100%) - [We are confident it will become an issue]
Use the rosa cli to install the cluster-logging-operator as addon to the cluster, it will not have the resource limit for cpu and memory set.
$ rosa install addon --cluster bmeng-sts-m1 cluster-logging-operator -i ? Are you sure you want to install add-on 'cluster-logging-operator' on cluster 'bmeng-sts-m1'? Yes ? Use AWS CloudWatch: Yes ? Collect Applications logs: Yes ? Collect Infrastructure logs: Yes ? Collect Audit logs (optional): No ? CloudWatch region (optional): I: Add-on 'cluster-logging-operator' is now installing. To check the status run 'rosa list addons -c bmeng-sts-m1'
$ oc get clusterlogging -n openshift-logging instance -o json | jq .spec { "collection": { "logs": { "fluentd": { "resources": { "requests": { "memory": "736Mi" } } }, "type": "fluentd" } }, "forwarder": { "fluentd": {} }, "managementState": "Managed" }
Which results some of the customer's master node being smashed with the high memory usage for logging components.
Eg:
$ oc adm top po -n openshift-logging NAME CPU(cores) MEMORY(bytes) cluster-logging-operator-57b6d6747f-l5tr9 0m 53Mi collector-9tf85 30m 929Mi collector-kfbkh 195m 51424Mi collector-lg57f 119m 49155Mi collector-lxr8w 37m 886Mi collector-twxhw 54m 957Mi collector-vjqn8 595m 32316Mi
Notes 2022-05-16:
- This issues has been moved from SDA to LOG project so that the logging team can work on this as the fix needs to come from them according to the discussion in this jira
Acceptance criteria:
- CLO sets defualt resource limits for the collector pods along with an option to adjust this limit as a parameter for the logging add-on for managed openshift.
- is caused by
-
LOG-2635 CloudWatch forwarding rejecting large log events, fills tmpfs
- Closed