-
Bug
-
Resolution: Unresolved
-
Major
-
None
-
4.18.0
-
Important
-
None
-
3
-
OCPEDGE Sprint 261, OCPEDGE Sprint 263
-
2
-
Proposed
-
False
-
[sig-architecture] platform pods in ns/openshift-cluster-storage-operator should not exit an excessive amount of times
The Snapshot controller on SNO is restarting a lot during kube api operator progressing, the error is due to not being able to pull the volume snapshots from kubeapi during start up.
After some investigation I think the best approach here will be to modify the interval the snapshot-controller waits for until it continues it's operation. We can't really do health check probes or startup probes on this deployment since the restart mechanism is part of the operand and it's not kubernetes that's restarting the pod. It might be best to utilize the --retry-crd-interval-max for SNO deployments of the operand to account for the API server not being reachable during rollouts. The operand is applied by the operator with these args and the deployment is ran through a template processor that we should be able to hook into for updating this behavior. (template replace logic)
Note: This error does seem to be present in the 4.17 branches as well
- is triggered by
-
OCPBUGS-43059 SNO Connection Error During Upgrades
- Closed