-
Bug
-
Resolution: Done
-
Major
-
4.16.z
Cloud Platform: AWS
Component: ODF / Ceph / NooBaa – CSI Snapshot Controller
Support Case Link: [04165738](https://gss--c.vf.force.com/apex/Support#/cases/04165738)
Description: The csi-snapshot-controller ClusterOperator is in a degraded state. The pods are reporting repeated timeouts while attempting to list volumesnapshotcontents. Error observed in pod logs:
Failed to list v1 volumesnapshotcontents with error=Get "https://172.30.0.1:443/apis/snapshot.storage.k8s.io/v1/volumesnapshotcontents(https://172.30.0.1:443/apis/snapshot.storage.k8s.io/v1/volumesnapshotcontents)": context deadline exceeded
Exiting due to failure to ensure CRDs exist during startup: context deadline exceeded
Observed Behavior:
- csi-snapshot-controller fails during initialization due to API server timeouts.
- No obvious issues found in the API server or other cluster components.
- Approximately 70,000 VolumeSnapshots exist in the cluster, which might be contributing to the load and latency experienced by the controller.
Expected Behavior: The CSI Snapshot Controller should handle large numbers of snapshot objects gracefully without degrading or timing out during startup.
Conclusion / Hypothesis: The issue appears to be due to the high volume of snapshot resources, leading to controller timeouts when interacting with the Kubernetes API server during startup.
Impact: Snapshot operations may be disrupted, and the overall state of the CSI Snapshot Controller remains degraded, which could impact backup and restore functionalities.