Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-2592

CVO hot-loops on Deployment manifests

    XMLWordPrintable

Details

    • Bug
    • Resolution: Done
    • Normal
    • None
    • 4.12
    • None
    • Moderate
    • 3
    • OTA 226
    • 1
    • False
    • Hide

      None

      Show
      None

    Description

      Description of problem:

      While looking into OCPBUGS-1458, I've noticed this behavior:

      $ for i in (seq 3)
            date
            oc logs -n openshift-cluster-version cluster-version-operator-7b95857ff9-dr6vd | grep -oP "Updating \K.+ due to diff" | cut -d' ' -f1 | sort | uniq -c
            sleep 300
        end
      Wed 19 Oct 14:19:35 CEST 2022
           27 CRD
            9 CronJob
          270 Deployment
           27 OperatorGroup
           18 ValidatingWebhookConfiguration
      Wed 19 Oct 14:24:37 CEST 2022
           30 CRD
           10 CronJob
          300 Deployment
           30 OperatorGroup
           20 ValidatingWebhookConfiguration
      Wed 19 Oct 14:29:39 CEST 2022
           36 CRD
           12 CronJob
          359 Deployment
           36 OperatorGroup
           24 ValidatingWebhookConfiguratio
      

       

      It does not seem to be limited to a single manifest:

      $ oc logs -n openshift-cluster-version cluster-version-operator-7b95857ff9-dr6vd | grep -oP "Updating Deployment \K.+ due to diff" | cut -d' ' -f1 | sort | uniq -c
           22 openshift-apiserver-operator/openshift-apiserver-operator
           22 openshift-authentication-operator/authentication-operator
           22 openshift-cloud-controller-manager-operator/cluster-cloud-controller-manager-operator
           22 openshift-cloud-credential-operator/cloud-credential-operator
           22 openshift-cluster-machine-approver/machine-approver
           22 openshift-cluster-samples-operator/cluster-samples-operator
           22 openshift-cluster-storage-operator/cluster-storage-operator
           22 openshift-cluster-storage-operator/csi-snapshot-controller-operator
           22 openshift-cluster-version/cluster-version-operator
           22 openshift-config-operator/openshift-config-operator
           22 openshift-console-operator/console-operator
           22 openshift-controller-manager-operator/openshift-controller-manager-operator
           22 openshift-dns-operator/dns-operator
           22 openshift-etcd-operator/etcd-operator
           22 openshift-image-registry/cluster-image-registry-operator
           22 openshift-ingress-operator/ingress-operator
           22 openshift-insights/insights-operator
           22 openshift-kube-apiserver-operator/kube-apiserver-operator
           22 openshift-kube-controller-manager-operator/kube-controller-manager-operator
           22 openshift-kube-scheduler-operator/openshift-kube-scheduler-operator
           22 openshift-kube-storage-version-migrator-operator/kube-storage-version-migrator-operator
           22 openshift-machine-api/cluster-autoscaler-operator
           22 openshift-machine-api/control-plane-machine-set-operator
           22 openshift-machine-api/machine-api-operator
           22 openshift-marketplace/marketplace-operator
           22 openshift-monitoring/cluster-monitoring-operator
           22 openshift-operator-lifecycle-manager/catalog-operator
           22 openshift-operator-lifecycle-manager/olm-operator
           22 openshift-operator-lifecycle-manager/package-server-manager
           22 openshift-service-ca-operator/service-ca-operator
      

      Inspecting the diff (attached) logged by CVO shows that the differences are probably related to some changes in defaulting.

      Version-Release number of selected component (if applicable):

      4.12.0-0.nightly-2022-10-18-192348

      How reproducible:

      looks like always on 4.12

      Steps to Reproduce:

      1. Filter CVO log: 
      grep -oP "Updating Deployment \K.+ due to diff" | cut -d' ' -f1 | sort | uniq -c 
      

      Actual results:

      Many hits on various (all?) deployment manifests
      

      Expected results:

      No hits
      

      Additional info:

      Seems to be a regression in 4.12. It is possible to inspect any CI job artifacts for the symptoms, and I found no hits in any 4.11 job. Examples:
      
      $  curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.12-e2e-gcp-ovn/1582453200294776832/artifacts/e2e-gcp-ovn/gather-extra/artifacts/pods/openshift-cluster-version_cluster-version-operator-75d8fb9fbc-88z6f_cluster-version-operator.log | grep -oP "Updating Deployment \K.+ due to diff" | cut -d' ' -f1 | sort | uniq -c
           17 openshift-apiserver-operator/openshift-apiserver-operator
           17 openshift-authentication-operator/authentication-operator
           17 openshift-cloud-controller-manager-operator/cluster-cloud-controller-manager-operator
           17 openshift-cloud-credential-operator/cloud-credential-operator
           17 openshift-cluster-machine-approver/machine-approver
           17 openshift-cluster-samples-operator/cluster-samples-operator
           17 openshift-cluster-storage-operator/cluster-storage-operator
           17 openshift-cluster-storage-operator/csi-snapshot-controller-operator
           17 openshift-cluster-version/cluster-version-operator
           17 openshift-config-operator/openshift-config-operator
           17 openshift-console-operator/console-operator
           17 openshift-controller-manager-operator/openshift-controller-manager-operator
           17 openshift-dns-operator/dns-operator
           17 openshift-etcd-operator/etcd-operator
           17 openshift-image-registry/cluster-image-registry-operator
           17 openshift-ingress-operator/ingress-operator
           17 openshift-insights/insights-operator
           17 openshift-kube-apiserver-operator/kube-apiserver-operator
           17 openshift-kube-controller-manager-operator/kube-controller-manager-operator
           17 openshift-kube-scheduler-operator/openshift-kube-scheduler-operator
           17 openshift-kube-storage-version-migrator-operator/kube-storage-version-migrator-operator
           17 openshift-machine-api/cluster-autoscaler-operator
           17 openshift-machine-api/control-plane-machine-set-operator
           17 openshift-machine-api/machine-api-operator
           17 openshift-marketplace/marketplace-operator
           17 openshift-monitoring/cluster-monitoring-operator
           17 openshift-operator-lifecycle-manager/catalog-operator
           17 openshift-operator-lifecycle-manager/olm-operator
           17 openshift-operator-lifecycle-manager/package-server-manager
           17 openshift-service-ca-operator/service-ca-operator
      $ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.11-e2e-gcp-ovn/1582635083397861376/artifacts/e2e-gcp-ovn/gather-extra/artifacts/pods/openshift-cluster-version_cluster-version-operator-dfdbcb69d-64gpg_cluster-version-operator.log | grep -oP "Updating Deployment \K.+ due to diff" | cut -d' ' -f1 | sort | uniq -c
      <no hits>
      

      Attachments

        Activity

          People

            afri@afri.cz Petr Muller
            afri@afri.cz Petr Muller
            Yang Yang Yang Yang
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: