Uploaded image for project: 'OpenShift Specialist Platform Team'
  1. OpenShift Specialist Platform Team
  2. SPLAT-2089

[platform-external] Investigate e2e failure: [sig-network][Feature:Router] The HAProxy router should enable openshift-monitoring to pull metrics [Skipped:Disconnected] [Suite:openshift/conformance/parallel]

    • None
    • None
    • OpenShift SPLAT - Sprint 269

      User Story:

      As an Engineer I want  to investigate e2e failure "[sig-network][Feature:Router] The HAProxy router should enable openshift-monitoring to pull metrics [Skipped:Disconnected] [Suite:openshift/conformance/parallel]" so that we can decrease the failure ration and increase confidence in conformance results running on platform type external.

       

      Description:
      We've noticed recurring failures in our CI jobs that involve HAProxy* testing on the AWS platform. These failures are causing delays and impacting our ability to deliver reliable PExt deployments to our partners. By investigating and resolving these issues, we aim to improve the stability of our CI/CD pipeline and boost confidence in our PExt deployment process.

      NOTE: The test is flaky in OPCT. When OPCT runs in serial mode (replay step) the test is passing.

      Log failures (timeout after 5'):

       

      E0314 23:11:38.080174       1 extended_validator.go:52] "msg"="skipping route due to invalid configuration" "error"="spec.tls.destinationCACertificate: Invalid value: \"redacted destination ca certificate data\": router does not support CA-signed certs using SHA1" "logger"="controller" "route"="e2e-test-cli-idling-bhzrc/idling-echo-reencrypt"
        E0314 23:11:38.080276       1 router_controller.go:273] invalid route configuration
      
      
      
        Ran 1 of 1 Specs in 308.088 seconds
        FAIL! -- 0 Passed | 1 Failed | 0 Pending | 0 Skipped
      fail [github.com/openshift/origin/test/extended/router/metrics.go:293]: Unexpected error:
          <wait.errInterrupted>: 
          timed out waiting for the condition
          {
              cause: <*errors.errorString | 0xc0003fe970>{
                  s: "timed out waiting for the condition",
              },
          }
      occurred
      Ginkgo exit error 1: exit with code 1

      This is the test definition: https://github.com/openshift/origin/blob/ceecbe5d0e44d54408bfa26fb92d8117edaf236f/test/extended/router/metrics.go#L278-L292

       

      The expected target exists in the Console.

      ToDo checks / open questions:

      • Is Prometheus Pod busy to answer or timeout shortly when parallel jobs are running?

      Acceptance Criteria:

      • Failure cause identified:
        • The failure must be fixed, OR
        • Addressed as a bug ticket on OCPBUGS
      • Tracking record must be open until the failure exists
      • Rehearsal job must be passing, or test skipped

      Other Information:
      < Record anything else that may be helpful to someone else picking up the card >

      issue created by splat-bot

              rhn-support-mrbraga Marco Braga
              rhn-ocp-splat-service-account OpenShift SPLAT Service Account
              None
              None
              None
              None
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated:
                Resolved: