-
Bug
-
Resolution: Unresolved
-
Major
-
None
-
4.18.0
-
None
-
None
-
Approved
-
False
-
Description of problem:
Sippy complains about pathological events in ns/openshift-cluster-csi-drivers in vsphere-ovn-serial jobs. See this job as one example.
Jan noticed that the DaemonSet generation is 10-12, while in 4.17 is 2. Why is our operator updating the DaemonSet so often?
I wrote a quick "one-liner" to generate json diffs from the vmware-vsphere-csi-driver-operator logs:
prev=''; grep 'DaemonSet "openshift-cluster-csi-drivers/vmware-vsphere-csi-driver-node" changes' openshift-cluster-csi-drivers_vmware-vsphere-csi-driver-operator-5b79c58f6f-hpr6g_vmware-vsphere-csi-driver-operator.log | sed 's/^.*changes: //' | while read -r line; do diff <(echo $prev | jq .) <(echo $line | jq .); prev=$line; echo "####"; done
It really seems to be only operator.openshift.io/spec-hash and operator.openshift.io/dep-* fields changing in the json diffs:
#### 4,5c4,5 < "operator.openshift.io/dep-1b5c921175cca7ab09ea7d1d58e35428291b8": "MZ-w-Q==", < "operator.openshift.io/spec-hash": "fb274874404ad6706171c6774a369876ca54e037fcccc200c0ebf3019a600c36" --- > "operator.openshift.io/dep-1b5c921175cca7ab09ea7d1d58e35428291b8": "AFeN-A==", > "operator.openshift.io/spec-hash": "27a1bab0c00ace8ac21d95a5fe9a089282e7b2b3ec042045951bd5e26ae01a09" 13c13 < "operator.openshift.io/dep-1b5c921175cca7ab09ea7d1d58e35428291b8": "MZ-w-Q==" --- > "operator.openshift.io/dep-1b5c921175cca7ab09ea7d1d58e35428291b8": "AFeN-A==" #### 4,5c4,5 < "operator.openshift.io/dep-1b5c921175cca7ab09ea7d1d58e35428291b8": "AFeN-A==", < "operator.openshift.io/spec-hash": "27a1bab0c00ace8ac21d95a5fe9a089282e7b2b3ec042045951bd5e26ae01a09" --- > "operator.openshift.io/dep-1b5c921175cca7ab09ea7d1d58e35428291b8": "MZ-w-Q==", > "operator.openshift.io/spec-hash": "fb274874404ad6706171c6774a369876ca54e037fcccc200c0ebf3019a600c36" 13c13 < "operator.openshift.io/dep-1b5c921175cca7ab09ea7d1d58e35428291b8": "AFeN-A==" --- > "operator.openshift.io/dep-1b5c921175cca7ab09ea7d1d58e35428291b8": "MZ-w-Q==" ####
The deployment is also changing in the same way. We need to find what is causing the spec-hash and dep-* fields to change and avoid the unnecessary churn that causes new daemonset / deployment rollouts.
Version-Release number of selected component (if applicable):
4.18.0
How reproducible:
~20% failure rate in 4.18 vsphere-ovn-serial jobs
Steps to Reproduce:
Actual results:
operator rolls out unnecessary daemonset / deployment changes
Expected results:
don't roll out changes unless there is a spec change
Additional info:
- blocks
-
OCPBUGS-45996 unnecessary daemonset / deployment rollouts on vsphere
- Verified
- is cloned by
-
OCPBUGS-45996 unnecessary daemonset / deployment rollouts on vsphere
- Verified
- is duplicated by
-
OCPBUGS-43381 Possible regression with storage deleting vmware-vsphere-csi-driver-node-xxxx pods multiple times
- Closed
- links to
-
RHEA-2024:6122 OpenShift Container Platform 4.18.z bug fix update