Uploaded image for project: 'OpenShift Hive'
  1. OpenShift Hive
  2. HIVE-2191

hive_cluster_deployment_deprovision_underway_seconds returns data on a deleted cluster deployment

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Normal Normal
    • None
    • None
    • None
    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • None
    • None
    • None
    • None
    • None
    • Customer Facing
    • None
    • None
    • None

      This was spotted for a CD that was blocked by PVC that has invalid state: efried.openshift helps me debug the case.
      https://redhat-internal.slack.com/archives/CHY2E1BL4/p1677695139921719?thread_ts=1677198429.622629&cid=CHY2E1BL4

      The PVC got deleted after contacting AWS support.
      Then the CD (also the namespace where the CD was) was gone as expected.

      oc --context hive get cd -A | grep -c ci-ocp-4-11-amd64-aws-us-east-1-vchv6
      0
      
      oc --context hive get cd -n ci-ocp-4-11-amd64-aws-us-east-1-vchv6
      No resources found in ci-ocp-4-11-amd64-aws-us-east-1-vchv6 namespace.
      
      oc --context hive get ns ci-ocp-4-11-amd64-aws-us-east-1-vchv6
      Error from server (NotFound): namespaces "ci-ocp-4-11-amd64-aws-us-east-1-vchv6" not found
      

      Forwarding the metrics port:

      oc --context hive -n hive port-forward hive-controllers-759f94989b-qffdx 2112:2112 --as system:admin
      Forwarding from 127.0.0.1:2112 -> 2112
      Forwarding from [::1]:2112 -> 2112
      

      The data is still there:

      curl -s localhost:2112/metrics | grep ci-ocp-4-11-amd64-aws-us-east-1-vchv6
      hive_cluster_deployment_deprovision_underway_seconds{cluster_deployment="ci-ocp-4-11-amd64-aws-us-east-1-vchv6",cluster_type="unspecified",namespace="ci-ocp-4-11-amd64-aws-us-east-1-vchv6"} 4.335304746743089e+06
      

      After restarting the hive controller, no data about the CD above is returned.

      On the way, I found other CDs in the metrics console UI that does not exist on the cluster but they are all gone after restart. I did not write them down though.

      I am not sure if the "blocked" CD is the key to reproduce.
      If it is, I still have other blocked cases (WIP) to verify the fix if needed.

              leah_leshchinsky Leah Leshchinsky (Inactive)
              hongkliu Hongkai Liu
              None
              None
              None
              Feilian Xie Feilian Xie (Inactive)
              None
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Created:
                Updated:
                Resolved: