Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-74701

CSI driver operators re-deploy DaemonSet too often

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Major Major
    • None
    • 4.21.0
    • Storage / Operators
    • None
    • None
    • False
    • Hide

      None

      Show
      None
    • None
    • None
    • None
    • None
    • Rejected
    • None
    • In Progress
    • Release Note Not Required
    • None
    • None
    • None
    • None
    • None

      This is a clone of issue OCPBUGS-74225. The following is the description of the original issue:

      When a CSI driver DaemonSet finishes updating all node Pods, it saves annotation `storage.openshift.io/stable-generation` with the DaemonSet generation that has just reached a stable state.

      But this save updates also annotation `operator.openshift.io/spec-hash`, while the operator did not do any DaemonSet spec update. I.e. the operator calculates a wrong hash.

      For example:

      -    operator.openshift.io/spec-hash: c1e9b6d31ee552c96b6735fcb42af4bb3fdfd7ff83038f7705fbd2dfed668731
      -    storage.openshift.io/stable-generation: "8"
      +    operator.openshift.io/spec-hash: f94257eece0170ebb52b1553a7798c909085709393ffb7156b77d1416c9ed6c7
      +    storage.openshift.io/stable-generation: "12"
      

      This leads to another DaemonSet update that actually fixes the hash:

       -    operator.openshift.io/spec-hash: f94257eece0170ebb52b1553a7798c909085709393ffb7156b77d1416c9ed6c7
      +    operator.openshift.io/spec-hash: c1e9b6d31ee552c96b6735fcb42af4bb3fdfd7ff83038f7705fbd2dfed668731
      

      The first update should just set the generation and leave the spec hash as it was. It leads to unnecessary noise in the API server.

       

      Steps to reproduce:

      1. Install a cluster, I am using vSphere
      2. Watch the DaemonSet changes, printing value of generation, and stable-generation and spec-hash annotations:
      oc -n openshift-cluster-csi-drivers get daemonset -w -o json | jq -r --unbuffered '"generation: " + (.metadata.generation | tostring) + " stable-generation: " + .metadata.annotations."operator.openshift.io/stable-generation" +  " spec-hash: " + .metadata.annotations."operator.openshift.io/spec-hash"'
      1. Update the driver DaemonSet spec. I am changing it manually, but you can imagine a cluster upgrade with new driver images:
      oc -n openshift-cluster-csi-drivers patch daemonset/vmware-vsphere-csi-driver-node -p '{"spec":{"template":{"metadata":{"annotations":{"foo":"bar"}}}}}'
      1. This bumps the DaemonSet generation once (from 279  to 280) in the script output.
      2. The operator fixes the DaemonSet, overwriting my changes. The generation changes from 280 to 281.
      3. The DaemonSet updates all its replicas with the latest changes (you can see several DaemonSet updates) and eventualy the operator saves `stable-generation: 281`. Notice the spec-hash annotation changes too. That's wrong!
      4. The operator fixes the spec-hash annotation via another update.
      # initial state:
      
      generation: 279 stable-generation: 279 spec-hash: e52fc6d13b84b3549feeb50da7f7026ad00619aeb403f2dc3c4f804d0bb018e7
      
      # I updated the DaemonSet spec + rolling update:
      generation: 280 stable-generation: 279 spec-hash: e52fc6d13b84b3549feeb50da7f7026ad00619aeb403f2dc3c4f804d0bb018e7
      generation: 280 stable-generation: 279 spec-hash: e52fc6d13b84b3549feeb50da7f7026ad00619aeb403f2dc3c4f804d0bb018e7
      generation: 280 stable-generation: 279 spec-hash: e52fc6d13b84b3549feeb50da7f7026ad00619aeb403f2dc3c4f804d0bb018e7
      
      # The operator fixed the DaemonSet spec + rolling update:
      generation: 281 stable-generation: 279 spec-hash: e52fc6d13b84b3549feeb50da7f7026ad00619aeb403f2dc3c4f804d0bb018e7
      generation: 281 stable-generation: 279 spec-hash: e52fc6d13b84b3549feeb50da7f7026ad00619aeb403f2dc3c4f804d0bb018e7
      generation: 281 stable-generation: 279 spec-hash: e52fc6d13b84b3549feeb50da7f7026ad00619aeb403f2dc3c4f804d0bb018e7
      generation: 281 stable-generation: 279 spec-hash: e52fc6d13b84b3549feeb50da7f7026ad00619aeb403f2dc3c4f804d0bb018e7
      generation: 281 stable-generation: 279 spec-hash: e52fc6d13b84b3549feeb50da7f7026ad00619aeb403f2dc3c4f804d0bb018e7
      generation: 281 stable-generation: 279 spec-hash: e52fc6d13b84b3549feeb50da7f7026ad00619aeb403f2dc3c4f804d0bb018e7
      generation: 281 stable-generation: 279 spec-hash: e52fc6d13b84b3549feeb50da7f7026ad00619aeb403f2dc3c4f804d0bb018e7
      generation: 281 stable-generation: 279 spec-hash: e52fc6d13b84b3549feeb50da7f7026ad00619aeb403f2dc3c4f804d0bb018e7
      generation: 281 stable-generation: 279 spec-hash: e52fc6d13b84b3549feeb50da7f7026ad00619aeb403f2dc3c4f804d0bb018e7
      
      # Update complete, the operator stores stable-generation with a wrong hash:
      generation: 281 stable-generation: 281 spec-hash: e0b35673d1c265d919888278e83096099c78528e80343c99f593e9719203fa57
      
      # The operator fixes the hash:
      generation: 281 stable-generation: 281 spec-hash: e52fc6d13b84b3549feeb50da7f7026ad00619aeb403f2dc3c4f804d0bb018e7
      

       

              rhn-engineering-jsafrane Jan Safranek
              rhn-support-jsafrane Jan Safranek
              None
              None
              Wei Duan Wei Duan
              None
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated: