Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-23377

Cluster-version operator "Running sync"/"Done syncing" steady-state log volume

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done-Errata
    • Icon: Normal Normal
    • 4.15.0
    • 4.13, 4.12, 4.11, 4.14, 4.15
    • None
    • Moderate
    • No
    • False
    • Hide

      None

      Show
      None
    • N/A
    • Release Note Not Required

      Description of problem:

      The cluster-version operator is very chatty, and this can cause problems in clusters where logs are shipped off to external storage. We worked on this in rbhz#2034493, which taught 4.10 and later to move to level 2 logging, mostly to drop the client-side throttling messages. And we have been pushing OTA-923 to make logging tunable, to avoid the need to make "will we want to hear about this?" decisions in one place for all clusters at all times. But there is interest in reducing the amount of logging in older releases in ways that do not require a tunable knob, and this bug tracks another step in that direction: the Running sync / Done syncing messages.

      h2 Version-Release number of selected component (if applicable):

      All 4.y releases log these lines at high volume, but 4.10 and earlier are end-of-life, and 4.11 and 4.12 are in maintenance mode.

      How reproducible:

      Every time.

      Steps to Reproduce:

      1. Install a cluster.
      2. Wait at least 30m since install or the most recent update completes, because we want the CVO to be chatty during those exciting times, and this bug is about steady-state log volume.
      3. Collect CVO logs for the past 30m: oc -n openshift-cluster-version logs -l k8s-app=cluster-version-operator --since=40m --tail=-1 >cvo.log.

      Actual results:

      $ oc adm upgrade
      Cluster version is 4.13.21
      ...
      $ oc -n openshift-cluster-version logs -l k8s-app=cluster-version-operator --since=40m --tail=-1 > cvo.log
      $ grep -o 'apply.*in state.*' cvo.log | uniq -c
           10 apply: 4.13.21 on generation 77 in state Reconciling at attempt 0
      $ wc cvo.log 
        20043  242930 3071956 cvo.log
      $ sed -n 's/^.* \([^ ]*[.]go:[0-9]*\).*/\1/p' cvo.log | sort | uniq -c | sort -n | tail -n5
          194 sync_worker.go:490
          314 sync_worker.go:978
          807 task_graph.go:477
         7971 sync_worker.go:1007
         7973 sync_worker.go:987
      $ grep 'sync_worker.go:987' cvo.log | tail -n2
      I1116 22:10:08.739999       1 sync_worker.go:987] Running sync for serviceaccount "openshift-cloud-credential-operator/cloud-credential-operator" (271 of 842)
      I1116 22:10:08.785081       1 sync_worker.go:987] Running sync for flowschema "openshift-apiserver" (457 of 842)
      $ grep 'sync_worker.go:1007' cvo.log | tail -n2
      I1116 22:10:08.739967       1 sync_worker.go:1007] Done syncing for configmap "openshift-cloud-credential-operator/cco-trusted-ca" (270 of 842)
      I1116 22:10:08.785043       1 sync_worker.go:1007] Done syncing for flowschema "openshift-apiserver-sar" (456 of 842)
      

      So that's 3071956 bytes / 30 minutes * 60 minutes / 1 hour ~= 6 MB / hour, the bulk of which is Running sync and Done syncing logs.

      Expected results:

      $ grep -v 'sync_worker.go:\(987\|1007\)]' cvo.log | wc
         4099   51602  861709
      

      So something closer to 861709 bytes / 30 minutes * 60 minutes / 1 hour ~= 2 MB / hour would be acceptable.

      Additional info:

      The CVO has a randomized sleep to cool off between sync cycles, and per-sync-cycle log volume will depend on (among other things) what that CVO container happened to choose for that sleep.

            lmohanty@redhat.com Lalatendu Mohanty
            trking W. Trevor King
            Jian Li Jian Li
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

              Created:
              Updated:
              Resolved: