-
Bug
-
Resolution: Unresolved
-
Major
-
None
-
4.14.z
-
None
-
No
-
False
-
Description of problem:
They are getting a problem while deleting the cluster deployment. The status of cluster deployment will stuck in Deprovisioning state and it will not move forward. When they check the uninstall pod, it went into crash loop back state with below logs:
~~~
time="2024-06-14T06:19:12Z" level=debug msg="Couldn't find install logs provider environment variable. Skipping."
time="2024-06-14T06:19:12Z" level=debug msg="no additional log fields found"
time="2024-06-14T06:19:12Z" level=info msg="running file observer" files="[/.azure/osServicePrincipal.json]"
I0614 06:19:12.068091 1 observer_polling.go:159] Starting file observer
time="2024-06-14T06:19:12Z" level=info msg="Using loaded object" name=tst-we-int04a-azure-creds namespace=tst-we-int04a type="*v1.Secret"
time="2024-06-14T06:19:12Z" level=fatal msg="Failed to write file" error="open /.azure/osServicePrincipal.json: permission denied" path=/.azure/osServicePrincipal.json
~~~
- Both install and uninstall job is running with same SCC and they are able to create the same file under /.azure while debug pod mode.
- The Pod keeps restarting and the issue resolved itself. They have not done anything to change it. events details shows that after the “DeadlineExceeded” -
> new job created another pod and same happened 2-3 times-> finally job completed which takes cares of the uninstall (deprovisioing) - They have used :
~~~
#oc delete clusterdeployment -n CLUSTER_NAME CLUSTER_NAME
#oc wait --for=delete -n CLUSTER_NAME clusterdeployment CLUSTER_NAME
~~~
these are events:
~~~
168m Normal SuccessfulDelete job/tst-we-int04a-uninstall Deleted pod: tst-we-int04a-uninstall-kfqpm
168m Warning DeadlineExceeded job/tst-we-int04a-uninstall Job was active longer than specified deadline
168m Normal SuccessfulCreate job/tst-we-int04a-uninstall Created pod: tst-we-int04a-uninstall-p8knc
108m Normal SuccessfulDelete job/tst-we-int04a-uninstall Deleted pod: tst-we-int04a-uninstall-p8knc
108m Warning DeadlineExceeded job/tst-we-int04a-uninstall Job was active longer than specified deadline
108m Normal SuccessfulCreate job/tst-we-int04a-uninstall Created pod: tst-we-int04a-uninstall-nsh2g
50m Normal Completed job/tst-we-int04a-uninstall Job completed
~~~
I dont see any errors in the hive-controller pod
Ocp version is 4.14, ACM version is 2.8.