Uploaded image for project: 'OpenShift Hive'
  1. OpenShift Hive
  2. HIVE-2479

Filter out InsufficientCredentials for InstallJobDelayHigh alert

XMLWordPrintable

    • Icon: Story Story
    • Resolution: Unresolved
    • Icon: Undefined Undefined
    • None
    • None
    • None
    • None
    • False
    • Hide

      None

      Show
      None
    • False
    • None
    • None
    • None
    • None

      We frequently see InstallJobDelayHigh due to customers attempting to install with insufficient credentials, which is not actionable by hive/SRE/etc. One way this manifests is DNSZones never becoming ready. In such cases, the DNSZone gets a status condition of type "InsufficientCredentials".

      It would be nice if we could label the metric on which this alert is built – hive_cluster_deployment_install_job_delay_seconds – such that we can filter it out in the alert def so we don't waste effort tracking these down.

      This is likely to involve some nontrivial refactoring, as today where we observe that metric we don't have much context to work with.

      Also, I've noticed lately that problems with DNSZone readiness that I think should result in DNSNotReadyTimeout... don't. Possibly a bug. Possibly unrelated to this card... but possibly related, as I think it would mean we wouldn't ever actually start the provision, and therefore never observe this metric. Worth investigating before we go too far down this rabbit hole.

              sumehta Suhani Mehta
              efried.openshift Eric Fried
              None
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated: