Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-30267

[IBMCloud] MonitorTests liveness/readiness probe error events repeat

    XMLWordPrintable

Details

    • Bug
    • Resolution: Unresolved
    • Normal
    • 4.16.0
    • 4.16.0
    • Test Framework
    • Moderate
    • No
    • False
    • Hide

      None

      Show
      None

    Description

      Description of problem:

          All e2e-ibmcloud-ovn testing is failing due to repeated events of liveness or readiness probes failing during MonitorTests.

      Version-Release number of selected component (if applicable):

          4.16.0-0.ci.test-2024-02-20-184205-ci-op-lghcpt9x-latest

      How reproducible:

          Appears to be 100%

      Steps to Reproduce:

          1. Setup IPI cluster on IBM Cloud
          2. Run OCP Conformance w/ MonitorTests (CI does this on IBM Cloud related PR's)
          

      Actual results:

          Failed OCP Conformance tests, due to MonitorTests failure:
      
      : [sig-arch] events should not repeat pathologically for ns/openshift-cloud-controller-manager expand_less0s{  2 events happened too frequently
      
      event happened 43 times, something is wrong: namespace/openshift-cloud-controller-manager node/ci-op-lghcpt9x-52953-tk4vl-master-2 pod/ibm-cloud-controller-manager-6c5f8594c5-bpnm8 hmsg/d91441a732 - reason/ProbeError Liveness probe error: Get "https://10.241.129.4:10258/healthz": dial tcp 10.241.129.4:10258: connect: connection refused result=reject 
      body: 
       From: 20:25:44Z To: 20:25:45Z
      event happened 43 times, something is wrong: namespace/openshift-cloud-controller-manager node/ci-op-lghcpt9x-52953-tk4vl-master-1 pod/ibm-cloud-controller-manager-6c5f8594c5-wn4fq hmsg/fda26f2bbf - reason/ProbeError Liveness probe error: Get "https://10.241.64.6:10258/healthz": dial tcp 10.241.64.6:10258: connect: connection refused result=reject 
      body: 
       From: 20:25:54Z To: 20:25:55Z}
      
      
      : [sig-arch] events should not repeat pathologically for ns/openshift-oauth-apiserver expand_less0s{  1 events happened too frequently
      
      event happened 25 times, something is wrong: namespace/openshift-oauth-apiserver node/ci-op-lghcpt9x-52953-tk4vl-master-1 pod/apiserver-c5ff4776b-kqg7c hmsg/c9e932e38d - reason/ProbeError Readiness probe error: HTTP probe failed with statuscode: 500 result=reject 
      body: [+]ping ok
      [+]log ok
      [+]etcd ok
      [-]etcd-readiness failed: reason withheld
      [+]informer-sync ok
      [+]poststarthook/generic-apiserver-start-informers ok
      [+]poststarthook/priority-and-fairness-config-consumer ok
      [+]poststarthook/priority-and-fairness-filter ok
      [+]poststarthook/storage-object-count-tracker-hook ok
      [+]poststarthook/openshift.io-StartUserInformer ok
      [+]poststarthook/openshift.io-StartOAuthInformer ok
      [+]poststarthook/openshift.io-StartTokenTimeoutUpdater ok
      [+]shutdown ok
      readyz check failed
      
       From: 20:25:04Z To: 20:25:05Z}

      Expected results:

          Passing OCP Conformance (w/ MonitorTests) test

      Additional info:

          The frequent (perhaps only) failures appear to occur via:
      
      [sig-arch] events should not repeat pathologically for ns/openshift-cloud-controller-manager
      
      [sig-arch] events should not repeat pathologically for ns/openshift-oauth-apiserver
      
      I am unsure on the cause of the liveness/readiness probe failures as of yet, unsure if the underlying Infrastructure is the cause (and if so, what resource).

      Attachments

        Activity

          People

            rhn-engineering-dgoodwin Devan Goodwin
            cschaefe@redhat.com Christopher Schaefer
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated: