Loading...

XML

Word

Printable

Type: Bug
Resolution: Done-Errata
Priority: Normal
Fix Version/s: 4.15.0
Affects Version/s: 4.13, 4.12, 4.11, 4.14, 4.15
Component/s: Cluster Version Operator
Labels:
None

Severity:
Moderate
Regression:
No
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Release Note Text:
N/A
Release Note Type:
Release Note Not Required
Target Version:

4.15.0

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Description of problem:

The cluster-version operator is very chatty, and this can cause problems in clusters where logs are shipped off to external storage. We worked on this in rbhz#2034493, which taught 4.10 and later to move to level 2 logging, mostly to drop the client-side throttling messages. And we have been pushing OTA-923 to make logging tunable, to avoid the need to make "will we want to hear about this?" decisions in one place for all clusters at all times. But there is interest in reducing the amount of logging in older releases in ways that do not require a tunable knob, and this bug tracks another step in that direction: the Running sync / Done syncing messages.

h2 Version-Release number of selected component (if applicable):

All 4.y releases log these lines at high volume, but 4.10 and earlier are end-of-life, and 4.11 and 4.12 are in maintenance mode.

How reproducible:

Every time.

Steps to Reproduce:

1. Install a cluster.
2. Wait at least 30m since install or the most recent update completes, because we want the CVO to be chatty during those exciting times, and this bug is about steady-state log volume.
3. Collect CVO logs for the past 30m: oc -n openshift-cluster-version logs -l k8s-app=cluster-version-operator --since=40m --tail=-1 >cvo.log.

Actual results:

$ oc adm upgrade
Cluster version is 4.13.21
...
$ oc -n openshift-cluster-version logs -l k8s-app=cluster-version-operator --since=40m --tail=-1 > cvo.log
$ grep -o 'apply.*in state.*' cvo.log | uniq -c
     10 apply: 4.13.21 on generation 77 in state Reconciling at attempt 0
$ wc cvo.log 
  20043  242930 3071956 cvo.log
$ sed -n 's/^.* \([^ ]*[.]go:[0-9]*\).*/\1/p' cvo.log | sort | uniq -c | sort -n | tail -n5
    194 sync_worker.go:490
    314 sync_worker.go:978
    807 task_graph.go:477
   7971 sync_worker.go:1007
   7973 sync_worker.go:987
$ grep 'sync_worker.go:987' cvo.log | tail -n2
I1116 22:10:08.739999       1 sync_worker.go:987] Running sync for serviceaccount "openshift-cloud-credential-operator/cloud-credential-operator" (271 of 842)
I1116 22:10:08.785081       1 sync_worker.go:987] Running sync for flowschema "openshift-apiserver" (457 of 842)
$ grep 'sync_worker.go:1007' cvo.log | tail -n2
I1116 22:10:08.739967       1 sync_worker.go:1007] Done syncing for configmap "openshift-cloud-credential-operator/cco-trusted-ca" (270 of 842)
I1116 22:10:08.785043       1 sync_worker.go:1007] Done syncing for flowschema "openshift-apiserver-sar" (456 of 842)

So that's 3071956 bytes / 30 minutes * 60 minutes / 1 hour ~= 6 MB / hour, the bulk of which is Running sync and Done syncing logs.

Expected results:

$ grep -v 'sync_worker.go:\(987\|1007\)]' cvo.log | wc
   4099   51602  861709

So something closer to 861709 bytes / 30 minutes * 60 minutes / 1 hour ~= 2 MB / hour would be acceptable.

Additional info:

The CVO has a randomized sleep to cool off between sync cycles, and per-sync-cycle log volume will depend on (among other things) what that CVO container happened to choose for that sleep.

is depended on by

RFE-2518 Hope the clusterversion operator can change the log level

Approved

relates to

OTA-923 Reduce cluster version operator logging

In Progress

links to

openshift/cluster-version-operator#997: OCPBUGS-23377: to reduce log noise, only log sync_worker messages on retry or payload change

RHSA-2023:7198 OpenShift Container Platform 4.15 security update

Assignee:: Lalatendu Mohanty

Reporter:: W. Trevor King

QA Contact:: Jian Li

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Created:: 2023/11/16 10:19 PM

Updated:: 2024/07/26 7:25 PM

Resolved:: 2024/02/27 9:04 PM

Details

Description

Description of problem:

How reproducible:

Steps to Reproduce:

Actual results:

Expected results:

Additional info:

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates