-
Bug
-
Resolution: Done
-
Normal
-
None
-
None
-
None
-
Quality / Stability / Reliability
-
False
-
-
None
-
None
-
None
-
None
-
None
-
Customer Facing
-
None
-
None
-
None
This was spotted for a CD that was blocked by PVC that has invalid state: efried.openshift helps me debug the case.
https://redhat-internal.slack.com/archives/CHY2E1BL4/p1677695139921719?thread_ts=1677198429.622629&cid=CHY2E1BL4
The PVC got deleted after contacting AWS support.
Then the CD (also the namespace where the CD was) was gone as expected.
oc --context hive get cd -A | grep -c ci-ocp-4-11-amd64-aws-us-east-1-vchv6 0 oc --context hive get cd -n ci-ocp-4-11-amd64-aws-us-east-1-vchv6 No resources found in ci-ocp-4-11-amd64-aws-us-east-1-vchv6 namespace. oc --context hive get ns ci-ocp-4-11-amd64-aws-us-east-1-vchv6 Error from server (NotFound): namespaces "ci-ocp-4-11-amd64-aws-us-east-1-vchv6" not found
Forwarding the metrics port:
oc --context hive -n hive port-forward hive-controllers-759f94989b-qffdx 2112:2112 --as system:admin Forwarding from 127.0.0.1:2112 -> 2112 Forwarding from [::1]:2112 -> 2112
The data is still there:
curl -s localhost:2112/metrics | grep ci-ocp-4-11-amd64-aws-us-east-1-vchv6 hive_cluster_deployment_deprovision_underway_seconds{cluster_deployment="ci-ocp-4-11-amd64-aws-us-east-1-vchv6",cluster_type="unspecified",namespace="ci-ocp-4-11-amd64-aws-us-east-1-vchv6"} 4.335304746743089e+06
After restarting the hive controller, no data about the CD above is returned.
On the way, I found other CDs in the metrics console UI that does not exist on the cluster but they are all gone after restart. I did not write them down though.
I am not sure if the "blocked" CD is the key to reproduce.
If it is, I still have other blocked cases (WIP) to verify the fix if needed.