-
Bug
-
Resolution: Done-Errata
-
Normal
-
4.13, 4.12, 4.11, 4.14, 4.15
-
None
-
Moderate
-
No
-
False
-
-
N/A
-
Release Note Not Required
Description of problem:
The cluster-version operator is very chatty, and this can cause problems in clusters where logs are shipped off to external storage. We worked on this in rbhz#2034493, which taught 4.10 and later to move to level 2 logging, mostly to drop the client-side throttling messages. And we have been pushing OTA-923 to make logging tunable, to avoid the need to make "will we want to hear about this?" decisions in one place for all clusters at all times. But there is interest in reducing the amount of logging in older releases in ways that do not require a tunable knob, and this bug tracks another step in that direction: the Running sync / Done syncing messages.
h2 Version-Release number of selected component (if applicable):
All 4.y releases log these lines at high volume, but 4.10 and earlier are end-of-life, and 4.11 and 4.12 are in maintenance mode.
How reproducible:
Every time.
Steps to Reproduce:
1. Install a cluster.
2. Wait at least 30m since install or the most recent update completes, because we want the CVO to be chatty during those exciting times, and this bug is about steady-state log volume.
3. Collect CVO logs for the past 30m: oc -n openshift-cluster-version logs -l k8s-app=cluster-version-operator --since=40m --tail=-1 >cvo.log.
Actual results:
$ oc adm upgrade Cluster version is 4.13.21 ... $ oc -n openshift-cluster-version logs -l k8s-app=cluster-version-operator --since=40m --tail=-1 > cvo.log $ grep -o 'apply.*in state.*' cvo.log | uniq -c 10 apply: 4.13.21 on generation 77 in state Reconciling at attempt 0 $ wc cvo.log 20043 242930 3071956 cvo.log $ sed -n 's/^.* \([^ ]*[.]go:[0-9]*\).*/\1/p' cvo.log | sort | uniq -c | sort -n | tail -n5 194 sync_worker.go:490 314 sync_worker.go:978 807 task_graph.go:477 7971 sync_worker.go:1007 7973 sync_worker.go:987 $ grep 'sync_worker.go:987' cvo.log | tail -n2 I1116 22:10:08.739999 1 sync_worker.go:987] Running sync for serviceaccount "openshift-cloud-credential-operator/cloud-credential-operator" (271 of 842) I1116 22:10:08.785081 1 sync_worker.go:987] Running sync for flowschema "openshift-apiserver" (457 of 842) $ grep 'sync_worker.go:1007' cvo.log | tail -n2 I1116 22:10:08.739967 1 sync_worker.go:1007] Done syncing for configmap "openshift-cloud-credential-operator/cco-trusted-ca" (270 of 842) I1116 22:10:08.785043 1 sync_worker.go:1007] Done syncing for flowschema "openshift-apiserver-sar" (456 of 842)
So that's 3071956 bytes / 30 minutes * 60 minutes / 1 hour ~= 6 MB / hour, the bulk of which is Running sync and Done syncing logs.
Expected results:
$ grep -v 'sync_worker.go:\(987\|1007\)]' cvo.log | wc 4099 51602 861709
So something closer to 861709 bytes / 30 minutes * 60 minutes / 1 hour ~= 2 MB / hour would be acceptable.
Additional info:
The CVO has a randomized sleep to cool off between sync cycles, and per-sync-cycle log volume will depend on (among other things) what that CVO container happened to choose for that sleep.
- is depended on by
-
RFE-2518 Hope the clusterversion operator can change the log level
- Accepted
- relates to
-
OTA-923 Reduce cluster version operator logging
- In Progress
- links to
-
RHSA-2023:7198 OpenShift Container Platform 4.15 security update