-
Bug
-
Resolution: Done-Errata
-
Major
-
4.17.0
-
Important
-
No
-
Hypershift Sprint 255
-
1
-
Proposed
-
False
-
-
Release Note Not Required
-
In Progress
Description of problem:
Azure HostedClusters are failing in OCP 4.17 due to issues with the cluster-storage-operator.
- lastTransitionTime: "2024-05-29T19:58:39Z" message: 'Unable to apply 4.17.0-0.nightly-multi-2024-05-29-121923: the cluster operator storage is not available' observedGeneration: 2 reason: ClusterOperatorNotAvailable status: "True" type: ClusterVersionProgressing
I0529 20:05:21.547544 1 status_controller.go:218] clusteroperator/storage diff {"status":{"conditions":[{"lastTransitionTime":"2024-05-29T20:02:00Z","message":"AzureDiskCSIDriverOperatorCRDegraded: AzureDiskDriverGuestStaticResourcesControllerDegraded: \"node_service.yaml\" (string): namespaces \"clusters-test-case4\" not found\nAzureDiskCSIDriverOperatorCRDegraded: AzureDiskDriverGuestStaticResourcesControllerDegraded: ","reason":"AzureDiskCSIDriverOperatorCR_AzureDiskDriverGuestStaticResourcesController_SyncError","status":"True","type":"Degraded"},{"lastTransitionTime":"2024-05-29T20:04:15Z","message":"AzureDiskCSIDriverOperatorCRProgressing: AzureDiskDriverNodeServiceControllerProgressing: Waiting for DaemonSet to deploy node pods","reason":"AzureDiskCSIDriverOperatorCR_AzureDiskDriverNodeServiceController_Deploying","status":"True","type":"Progressing"},{"lastTransitionTime":"2024-05-29T19:59:00Z","message":"AzureDiskCSIDriverOperatorCRAvailable: AzureDiskDriverNodeServiceControllerAvailable: Waiting for the DaemonSet to deploy the CSI Node Service","reason":"AzureDiskCSIDriverOperatorCR_AzureDiskDriverNodeServiceController_Deploying","status":"False","type":"Available"},{"lastTransitionTime":"2024-05-29T19:59:00Z","message":"All is well","reason":"AsExpected","status":"True","type":"Upgradeable"},{"lastTransitionTime":"2024-05-29T19:59:00Z","reason":"NoData","status":"Unknown","type":"EvaluationConditionsDetected"}]}} I0529 20:05:21.566215 1 event.go:364] Event(v1.ObjectReference{Kind:"Deployment", Namespace:"openshift-cluster-storage-operator", Name:"azure-cloud-controller-manager", UID:"205a4307-67e4-481e-9fee-975b2c5c40fb", APIVersion:"apps/v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'OperatorStatusChanged' Status for clusteroperator/storage changed: Progressing message changed from "AzureDiskCSIDriverOperatorCRProgressing: AzureDiskDriverNodeServiceControllerProgressing: Waiting for DaemonSet to deploy node pods\nAzureFileCSIDriverOperatorCRProgressing: AzureFileDriverNodeServiceControllerProgressing: Waiting for DaemonSet to deploy node pods" to "AzureDiskCSIDriverOperatorCRProgressing: AzureDiskDriverNodeServiceControllerProgressing: Waiting for DaemonSet to deploy node pods"
On the HostedCluster itself, these errors with the csi pods not coming up are:
% k describe pod/azure-disk-csi-driver-node-5hb24 -n openshift-cluster-csi-drivers | grep fail Liveness: http-get http://:healthz/healthz delay=10s timeout=3s period=10s #success=1 #failure=5 Liveness: http-get http://:rhealthz/healthz delay=10s timeout=3s period=10s #success=1 #failure=5 Warning FailedMount 2m (x28 over 42m) kubelet MountVolume.SetUp failed for volume "metrics-serving-cert" : secret "azure-disk-csi-driver-node-metrics-serving-cert" not found
There was an error with the CO as well:
storage 4.17.0-0.nightly-multi-2024-05-29-121923 False True True 49m AzureDiskCSIDriverOperatorCRAvailable: AzureDiskDriverNodeServiceControllerAvailable: Waiting for the DaemonSet to deploy the CSI Node Service
Version-Release number of selected component (if applicable):
4.17
How reproducible:
Every time
Steps to Reproduce:
1. Create a HC with a 4.17 nightly
Actual results:
Azure HC does not complete; nodes do join NodePool though
Expected results:
Azure HC should complete
Additional info:
- links to
-
RHEA-2024:3718 OpenShift Container Platform 4.17.z bug fix update