Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-55809

e2e-aws-ovn-ipsec-upgrade job is failing with disruptive events

XMLWordPrintable

    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • None
    • Important
    • None
    • None
    • Rejected
    • None
    • In Progress
    • Release Note Not Required
    • None
    • None
    • None
    • None
    • None

      This is a clone of issue OCPBUGS-55262. The following is the description of the original issue:

      Description of problem:

      The e2e-aws-ovn-ipsec-upgrade CI lane is not passing 100%, mostly api server pod failing to connect with a metric api server endpoint for some period at the time of upgrade. 
      
      It seems like pod to pod connectivity issue between two nodes in particular, at the time of node reboots, journal, pluto logs seem to be clean. 
      
      Need to investigate if there is any missing ip xfrm state and policy for that period or libreswan 5.2 bump regressing this problem.
      
      : [sig-instrumentation] disruption/metrics-api connection/new should be available throughout the test expand_less0s{  backend-disruption-name/metrics-api-new-connections connection/new disruption/openshift-tests was unreachable during disruption:  for at least 1m3s (maxAllowed=6s):
      P99 from historical data for similar jobs over past 3 weeks: 0s
      rounded P99 up to always allow one second
      added an additional 5s of grace
      
      Apr 22 23:18:29.808 - 1s    E backend-disruption-name/metrics-api-new-connections connection/new disruption/openshift-tests reason/DisruptionBegan request-audit-id/0c1fe229-54bc-4a00-9938-339b49dd08f6 backend-disruption-name/metrics-api-new-connections connection/new disruption/openshift-tests stopped responding to GET requests over new connections: error running request: 503 Service Unavailable: error trying to reach service: context deadline exceeded
      Apr 22 23:18:38.809 - 999ms E backend-disruption-name/metrics-api-new-connections connection/new disruption/openshift-tests reason/DisruptionBegan request-audit-id/e729d243-f23f-4d85-9052-f500e75e4e9c backend-disruption-name/metrics-api-new-connections connection/new disruption/openshift-tests stopped responding to GET requests over new connections: error running request: 503 Service Unavailable: error trying to reach service: context deadline exceeded
      ....
      Apr 22 23:23:22.808 - 1s    E backend-disruption-name/metrics-api-new-connections connection/new disruption/openshift-tests reason/DisruptionBegan request-audit-id/20ea7c5c-bb98-499f-8856-58462e2dbdbf backend-disruption-name/metrics-api-new-connections connection/new disruption/openshift-tests stopped responding to GET requests over new connections: error running request: 503 Service Unavailable: error trying to reach service: context deadline exceeded
      Apr 22 23:23:33.809 - 2s    E backend-disruption-name/metrics-api-new-connections connection/new disruption/openshift-tests reason/DisruptionBegan request-audit-id/b4eb2400-79ec-40aa-b840-8af28f42bebf backend-disruption-name/metrics-api-new-connections connection/new disruption/openshift-tests stopped responding to GET requests over new connections: error running request: 503 Service Unavailable: error trying to reach service: context deadline exceeded}

      Version-Release number of selected component (if applicable):

      https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/openshift_cluster-network-operator/2573/pull-ci-openshift-cluster-network-operator-master-e2e-aws-ovn-ipsec-upgrade/1914778546035757056
      
      https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/openshift_cluster-network-operator/2674/pull-ci-openshift-cluster-network-operator-master-e2e-aws-ovn-ipsec-upgrade/1914778428616216576
      
      https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/openshift_release/63667/rehearse-63667-periodic-ci-openshift-release-master-nightly-4.19-e2e-aws-ovn-ipsec-upgrade/1911787277483249664
      
      https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/openshift_release/63904/rehearse-63904-periodic-ci-openshift-release-master-nightly-4.19-e2e-aws-ovn-ipsec-upgrade/1912386358903574528   

      How reproducible:

          

      Steps to Reproduce:

          1.
          2.
          3.
          

      Actual results:

          

      Expected results:

          

      Additional info:

          

              pepalani@redhat.com Periyasamy Palanisamy
              openshift-crt-jira-prow OpenShift Prow Bot
              None
              None
              Anurag Saxena Anurag Saxena
              None
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

                Created:
                Updated:
                Resolved: