Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-42610

unnecessary daemonset / deployment rollouts on vsphere

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Major Major
    • None
    • 4.18.0
    • Storage / Operators
    • None
    • None
    • Proposed
    • False
    • Hide

      None

      Show
      None

      Description of problem:

      Sippy complains about pathological events in ns/openshift-cluster-csi-drivers in vsphere-ovn-serial jobs. See this job as one example.

      Jan noticed that the DaemonSet generation is 10-12, while in 4.17 is 2. Why is our operator updating the DaemonSet so often?

      I wrote a quick "one-liner" to generate json diffs from the vmware-vsphere-csi-driver-operator logs:

      prev=''; grep 'DaemonSet "openshift-cluster-csi-drivers/vmware-vsphere-csi-driver-node" changes' openshift-cluster-csi-drivers_vmware-vsphere-csi-driver-operator-5b79c58f6f-hpr6g_vmware-vsphere-csi-driver-operator.log | sed 's/^.*changes: //' | while read -r line; do diff <(echo $prev | jq .) <(echo $line | jq .); prev=$line; echo "####"; done 

      It really seems to be only operator.openshift.io/spec-hash and operator.openshift.io/dep-* fields changing in the json diffs:

      ####
      4,5c4,5
      <       "operator.openshift.io/dep-1b5c921175cca7ab09ea7d1d58e35428291b8": "MZ-w-Q==",
      <       "operator.openshift.io/spec-hash": "fb274874404ad6706171c6774a369876ca54e037fcccc200c0ebf3019a600c36"
      ---
      >       "operator.openshift.io/dep-1b5c921175cca7ab09ea7d1d58e35428291b8": "AFeN-A==",
      >       "operator.openshift.io/spec-hash": "27a1bab0c00ace8ac21d95a5fe9a089282e7b2b3ec042045951bd5e26ae01a09"
      13c13
      <           "operator.openshift.io/dep-1b5c921175cca7ab09ea7d1d58e35428291b8": "MZ-w-Q=="
      ---
      >           "operator.openshift.io/dep-1b5c921175cca7ab09ea7d1d58e35428291b8": "AFeN-A=="
      ####
      4,5c4,5
      <       "operator.openshift.io/dep-1b5c921175cca7ab09ea7d1d58e35428291b8": "AFeN-A==",
      <       "operator.openshift.io/spec-hash": "27a1bab0c00ace8ac21d95a5fe9a089282e7b2b3ec042045951bd5e26ae01a09"
      ---
      >       "operator.openshift.io/dep-1b5c921175cca7ab09ea7d1d58e35428291b8": "MZ-w-Q==",
      >       "operator.openshift.io/spec-hash": "fb274874404ad6706171c6774a369876ca54e037fcccc200c0ebf3019a600c36"
      13c13
      <           "operator.openshift.io/dep-1b5c921175cca7ab09ea7d1d58e35428291b8": "AFeN-A=="
      ---
      >           "operator.openshift.io/dep-1b5c921175cca7ab09ea7d1d58e35428291b8": "MZ-w-Q=="
      #### 

      The deployment is also changing in the same way. We need to find what is causing the spec-hash and dep-* fields to change and avoid the unnecessary churn that causes new daemonset / deployment rollouts.

       

      Version-Release number of selected component (if applicable):

      4.18.0

      How reproducible:

      ~20% failure rate in 4.18 vsphere-ovn-serial jobs

      Steps to Reproduce:

          

      Actual results:

      operator rolls out unnecessary daemonset / deployment changes

      Expected results:

      don't roll out changes unless there is a spec change

      Additional info:

          

            rbednar@redhat.com Roman Bednar
            jdobson@redhat.com Jonathan Dobson
            Penghao Wang Penghao Wang
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

              Created:
              Updated: