-
Story
-
Resolution: Cannot Reproduce
-
Normal
-
None
-
None
-
Quality / Stability / Reliability
-
False
-
-
False
-
1
-
5
-
None
-
None
-
OpenShift SPLAT - Sprint 269
User Story:
As an Engineer I want to investigate e2e failure "[sig-network][Feature:Router] The HAProxy router should enable openshift-monitoring to pull metrics [Skipped:Disconnected] [Suite:openshift/conformance/parallel]" so that we can decrease the failure ration and increase confidence in conformance results running on platform type external.
Description:
We've noticed recurring failures in our CI jobs that involve HAProxy* testing on the AWS platform. These failures are causing delays and impacting our ability to deliver reliable PExt deployments to our partners. By investigating and resolving these issues, we aim to improve the stability of our CI/CD pipeline and boost confidence in our PExt deployment process.
NOTE: The test is flaky in OPCT. When OPCT runs in serial mode (replay step) the test is passing.
Log failures (timeout after 5'):
E0314 23:11:38.080174 1 extended_validator.go:52] "msg"="skipping route due to invalid configuration" "error"="spec.tls.destinationCACertificate: Invalid value: \"redacted destination ca certificate data\": router does not support CA-signed certs using SHA1" "logger"="controller" "route"="e2e-test-cli-idling-bhzrc/idling-echo-reencrypt" E0314 23:11:38.080276 1 router_controller.go:273] invalid route configuration Ran 1 of 1 Specs in 308.088 seconds FAIL! -- 0 Passed | 1 Failed | 0 Pending | 0 Skipped fail [github.com/openshift/origin/test/extended/router/metrics.go:293]: Unexpected error: <wait.errInterrupted>: timed out waiting for the condition { cause: <*errors.errorString | 0xc0003fe970>{ s: "timed out waiting for the condition", }, } occurred Ginkgo exit error 1: exit with code 1
This is the test definition: https://github.com/openshift/origin/blob/ceecbe5d0e44d54408bfa26fb92d8117edaf236f/test/extended/router/metrics.go#L278-L292
The expected target exists in the Console.
ToDo checks / open questions:
- Is Prometheus Pod busy to answer or timeout shortly when parallel jobs are running?
Acceptance Criteria:
- Failure cause identified:
- The failure must be fixed, OR
- Addressed as a bug ticket on OCPBUGS
- Tracking record must be open until the failure exists
- Rehearsal job must be passing, or test skipped
Other Information:
< Record anything else that may be helpful to someone else picking up the card >
issue created by splat-bot
- duplicates
-
SPLAT-1854 [platform-external] investigate permanent failures in CI jobs caused by 'HA Proxy' tests on AWS provider
-
- Closed
-
- is related to
-
SPLAT-1854 [platform-external] investigate permanent failures in CI jobs caused by 'HA Proxy' tests on AWS provider
-
- Closed
-
- relates to
-
OPCT-240 CLI report overview: decrease to zero false-positive failures
-
- Testing
-