Uploaded image for project: 'OpenShift Over the Air'
  1. OpenShift Over the Air
  2. OTA-860

CVO: Optimize Upgradeable check API usage and reduce throttling on its status update

XMLWordPrintable

    • 5
    • False
    • None
    • False

      While we worked on OCPBUGS-5505 where we made Upgradeable check throttling period deterministic, we also considered the option to reduce the period, to 1 minute or even lower. We were not sure whether doing so is safe though, because there is some evidence that CVO uses cluster apiserver more intensively than necessary, not going through informers (and hence their local cache):

      Re: Lala's "why not squeeze harder?" https://github.com/openshift/cluster-version-operator/pull/882#discussion_r1069679453 , it is probably worth peeking at audit logs for one of the CI runs, to see if our ClusterOperator call rate is sustainable. I'd have expected our ClusterOperator access to flow through an informer, so higher nominal-access would be absorbed by our local cache and not make it out to the API server. But https://redhat-internal.slack.com/archives/C01CQA76KMX/p1672954955591829?thread_ts=1672946726.268369&cid=C01CQA76KMX suggests that at least some call sites are using direct calls, and not the informers, and we may not want to go too hard if these Upgradeable checks are actually direct calls

      W. Trevor King

      $ zgrep -h '"username":"system:serviceaccount:openshift-cluster-version:default"' kube-apiserver/*.log.gz | jq -r '.requestURI' | sort | uniq -c | sort -n | tail
           32 /apis/apiextensions.k8s.io/v1/customresourcedefinitions/performanceprofiles.performance.openshift.io
           33 /apis/config.openshift.io/v1/infrastructures/cluster
           34 /apis/admissionregistration.k8s.io/v1/validatingwebhookconfigurations/controlplanemachineset.machine.openshift.io
           34 /apis/admissionregistration.k8s.io/v1/validatingwebhookconfigurations/performance-addon-operator
           34 /apis/batch/v1/namespaces/openshift-operator-lifecycle-manager/cronjobs/collect-profiles
           36 /apis/image.openshift.io/v1/namespaces/openshift/imagestreams/driver-toolkit
          135 /apis/config.openshift.io/v1/proxies/cluster
          326 /api/v1/namespaces/openshift-cluster-version/configmaps/version
          489 /apis/coordination.k8s.io/v1/namespaces/openshift-cluster-version/leases/version
          530 /apis/config.openshift.io/v1/clusteroperators
      

      possibly we have a ClusterOperator consumer that needs to get wired up to our existing informer...

      This story intends to optimize the API usage of the Upgradeable check and status synchronization in CVO and find the optimal (lower) throttling period for its status sync.

              Unassigned Unassigned
              afri@afri.cz Petr Muller
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Created:
                Updated: