Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-3643

Readiness and Liveliness Probe errors in openshift-config-operator and test failures

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Won't Do
    • Icon: Minor Minor
    • None
    • 4.12
    • config-operator
    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Description of problem:

      We are observing many ProbeError Readiness and Liveliness probe errors on openshift-config-operator/openshift-config-operator-* pods that show up in the "[sig-arch] events should not repeat pathologically" test.
      
      These tests were created and then flaked so that those events can be shown in the failure output to make the failure easy to find in a query.
      
      The tests were created in https://github.com/openshift/origin/pull/27539 as:
      
      [sig-node] openshift-config-operator should not get probe error on readiness probe due to timeout
      [sig-node] openshift-config-operator should not get probe error on liveness probe due to timeout
      {
      I put the two tests into one Jira thinking they are closely related.  If it makes more sense to make them as separate Jiras, I can make two more (just let me know).
      
      Sample jobs (I included the event/reason, prowjob url and a timestamp):
      
      ns/openshift-config-operator pod/openshift-config-operator-7dd776b8b4-jvd9j node/ip-10-0-173-147.us-west-2.compute.internal - reason/ProbeError Readiness probe error: Get "https://10.129.0.45:8443/healthz": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers) https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.12-e2e-aws-ovn-upgrade/1587277786018484224 2022-11-01T00:00:00Z
      ns/openshift-config-operator pod/openshift-config-operator-7f68cd7f84-9fxbq node/ip-10-0-131-90.us-west-2.compute.internal - reason/ProbeError Liveness probe error: Get "https://10.129.0.28:8443/healthz": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers) https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.12-e2e-aws-sdn-upgrade/1587443722428092416 2022-11-01T00:00:00Z
      ns/openshift-config-operator pod/openshift-config-operator-bc8c7b96-sjmzm node/ip-10-0-224-244.us-west-2.compute.internal - reason/ProbeError Liveness probe error: Get "https://10.129.0.24:8443/healthz": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers) https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.12-e2e-aws-sdn-upgrade/1587902657782091776 2022-11-02T00:00:00Z
      ns/openshift-config-operator pod/openshift-config-operator-f948bf5fd-f99dk node/ip-10-0-234-129.us-east-2.compute.internal - reason/ProbeError Liveness probe error: Get "https://10.129.0.17:8443/healthz": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers) https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.12-upgrade-from-stable-4.11-e2e-aws-ovn-upgrade/1588000985647681536 2022-11-03T00:00:00Z
      ns/openshift-config-operator pod/openshift-config-operator-f948bf5fd-f99dk node/ip-10-0-234-129.us-east-2.compute.internal - reason/ProbeError Readiness probe error: Get "https://10.129.0.17:8443/healthz": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers) https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.12-upgrade-from-stable-4.11-e2e-aws-ovn-upgrade/1588000985647681536 2022-11-03T00:00:00Z
      ns/openshift-config-operator pod/openshift-config-operator-68fbc8b886-4qnmr node/ip-10-0-191-43.us-west-2.compute.internal - reason/ProbeError Readiness probe error: Get "https://10.128.0.31:8443/healthz": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers) https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.12-e2e-aws-ovn-upgrade/1588272780762157056 2022-11-03T00:00:00Z
      ns/openshift-config-operator pod/openshift-config-operator-68fbc8b886-4qnmr node/ip-10-0-191-43.us-west-2.compute.internal - reason/ProbeError Liveness probe error: Get "https://10.128.0.31:8443/healthz": context deadline exceeded (Client.Timeout exceeded while awaiting headers) https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.12-e2e-aws-ovn-upgrade/1588272780762157056 2022-11-03T00:00:00Z
      ns/openshift-config-operator pod/openshift-config-operator-56c4846798-8g6kc node/ip-10-0-170-9.us-west-2.compute.internal - reason/ProbeError Liveness probe error: Get "https://10.130.0.19:8443/healthz": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers) https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.12-upgrade-from-stable-4.11-e2e-aws-sdn-upgrade/1588586875906428928 2022-11-04T00:00:00Z
      ns/openshift-config-operator pod/openshift-config-operator-56c4846798-8g6kc node/ip-10-0-170-9.us-west-2.compute.internal - reason/ProbeError Readiness probe error: Get "https://10.130.0.19:8443/healthz": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers) https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.12-upgrade-from-stable-4.11-e2e-aws-sdn-upgrade/1588586875906428928 2022-11-04T00:00:00Z
      ns/openshift-config-operator pod/openshift-config-operator-7876f6bc6d-7sxq6 node/ip-10-0-133-74.us-east-2.compute.internal - reason/ProbeError Liveness probe error: Get "https://10.130.0.20:8443/healthz": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers) https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.12-upgrade-from-stable-4.11-e2e-aws-ovn-upgrade/1588363401023721472 2022-11-04T00:00:00Z
      ns/openshift-config-operator pod/openshift-config-operator-6df64df48-fq4hs node/ip-10-0-250-176.us-west-2.compute.internal - reason/ProbeError Liveness probe error: Get "https://10.129.0.21:8443/healthz": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers) https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.12-upgrade-from-stable-4.11-e2e-aws-ovn-upgrade/1588679409726918656 2022-11-04T00:00:00Z
      ns/openshift-config-operator pod/openshift-config-operator-78d7b4c854-hlp5c node/ip-10-0-175-122.us-east-2.compute.internal - reason/ProbeError Liveness probe error: Get "https://10.128.0.53:8443/healthz": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers) https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.12-upgrade-from-stable-4.11-e2e-aws-ovn-upgrade/1589562439840567296 2022-11-07T00:00:00Z
      ns/openshift-config-operator pod/openshift-config-operator-78d7b4c854-st7dr node/ip-10-0-186-247.ec2.internal - reason/ProbeError Readiness probe error: Get "https://10.129.0.49:8443/healthz": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers) https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.12-e2e-aws-ovn-upgrade/1589562429853929472 2022-11-07T00:00:00Z
      ns/openshift-config-operator pod/openshift-config-operator-787857c9dd-5f57x node/ip-10-0-151-175.us-west-2.compute.internal - reason/ProbeError Readiness probe error: Get "https://10.130.0.15:8443/healthz": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers) https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.12-e2e-aws-ovn/1589683223141552128 2022-11-07T00:00:00Z
      ns/openshift-config-operator pod/openshift-config-operator-5966868ffb-4nm8m node/ip-10-0-165-138.ec2.internal - reason/ProbeError Readiness probe error: Get "https://10.128.0.33:8443/healthz": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers) https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.12-upgrade-from-stable-4.11-e2e-aws-ovn-upgrade/1589879020550361088 2022-11-08T00:00:00Z
      ns/openshift-config-operator pod/openshift-config-operator-787857c9dd-zk2s6 node/ip-10-0-167-211.us-west-2.compute.internal - reason/ProbeError Readiness probe error: Get "https://10.128.0.12:8443/healthz": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers) https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.12-e2e-aws-sdn-cgroupsv2/1590049963872620544 2022-11-08T00:00:00Z 

      Version-Release number of selected component (if applicable):

      4.12

      How reproducible:

      intermittently but happens on many jobs

      Steps to Reproduce:

      1.The jobs are periodic so you can find them using a query like this:  https://search.ci.openshift.org/?search=.*openshift-config-operator.*openshift-config-operator.*ProbeError+%28Liveness%7CReadiness%29+probe+error&maxAge=48h&context=1&type=bug%2Bissue%2Bjunit&name=&excludeName=&maxMatches=5&maxBytes=20971520&groupBy=job
      2.
      3.
      

      Actual results:

      Many ProbeError events due to failing readiness and liveliness probes

      Expected results:

      Readiness and Liveness probes passing

      Additional info:

       

              Unassigned Unassigned
              dperique@redhat.com Dennis Periquet
              None
              None
              Rio Liu Rio Liu
              None
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated:
                Resolved: