Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-8857

Prometheus never sees endpoint propagation of a deleted pod

XMLWordPrintable

    • Moderate
    • None
    • Unspecified
    • If docs needed, set a value

      In https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.8-e2e-gcp-upgrade/1375890420118065152

      during node upgrades the oauth-openshift-...-vrmzm pod is marked as deleted at 20:59:42 and is not fully deleted 50m later. This causes TargetDown to fire because the pod is unready and is never completed.

      Mar 27 20:59:42.790 W ns/openshift-authentication pod/oauth-openshift-b88f6b558-vrmzm node/**************-17f95-gdtt6-master-1 reason/GracefulDelete in 40s

      Marking high because this is a bad state to get in and effectively wedged the whole upgrade process. May be similar to other failures we are seeing in CI (such as the pod submit test still failing occasionally.

      This is a release blocking bug and is potentially high impact to CI (although I have not quantified impact yet)

              jfajersk@redhat.com Jan Fajerski
              openshift_jira_bot OpenShift Jira Bot
              Junqi Zhao Junqi Zhao
              Red Hat Employee
              Votes:
              1 Vote for this issue
              Watchers:
              23 Start watching this issue

                Created:
                Updated:
                Resolved: