Loading...

XML

Word

Printable

Type: Task
Resolution: Duplicate
Priority: Normal
Fix Version/s: None
Affects Version/s: None
Labels:

Activity Type:
Future Sustainability
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False
Epic Link:
Stabilize CI conformance jobs for platform type External
Story Points:
3
Original story points:
3

Target Version:
None
Release Blocker:
None
Sprint:
OpenShift SPLAT - Sprint 268, OpenShift SPLAT - Sprint 269

As an OCP Engineer, I would like to know the reason bootstrap failed to install on Platform External clusters on AWS when API is is down on step "platform-external-cluster-wait-for-api-bootstrap" by collecting logs through SSH, so I will have more information while investigating the root cause of failures.

Acceptance criteria:

Attempt to collect logs from systemd through SSH in the step [1] when the API is down
- Show preview of logs (tail?) before exit with failure
Review, and adjust if needed, the timeout (the jobs[2] took like 20+ minutes to get feedback)

Engineering references:

[1] step failure e2e-external-aws-ccm-platform-external-cluster-wait-for-api-bootstrap

~~~
E1027 00:49:55.231610 1000 memcache.go:265] couldn't get current server API group list: Get "https://api.ci-op-stflmg87-cd183.origin-ci-int-aws.dev.rhcloud.com:6443/api?timeout=32s": dial tcp 3.23.214.125:6443: connect: connection refused
386
The connection to the server api.ci-op-stflmg87-cd183.origin-ci-int-aws.dev.rhcloud.com:6443 was refused - did you specify the right host or port?
387
2024-10-27 00:49:55+00:00 - API DOWN, waiting 30s...
388

{"component":"entrypoint","file":"sigs.k8s.io/prow/pkg/entrypoint/run.go:267","func":"sigs.k8s.io/prow/pkg/entrypoint.gracefullyTerminate","level":"error","msg":"Process did not exit before 10m0s grace period","severity":"error","time":"2024-10-27T00:50:05Z"}

389

{"component":"entrypoint","error":"process timed out","file":"sigs.k8s.io/prow/pkg/entrypoint/run.go:84","func":"sigs.k8s.io/prow/pkg/entrypoint.Options.internalRun","level":"error","msg":"Error executing test process","severity":"error","time":"2024-10-27T00:50:05Z"}

390
error: failed to execute wrapped command: exit status 127
391
INFO[2024-10-27T00:50:05Z] Step e2e-external-aws-ccm-platform-external-cluster-wait-for-api-bootstrap failed after 25m10s.
~~~

[2] References / job failures:

https://prow.ci.openshift.org/view/gs/test-platform-results/logs/periodic-ci-openshift-release-master-ci-4.16-e2e-external-aws-ccm/1850326952720732160

https://prow.ci.openshift.org/view/gs/test-platform-results/logs/periodic-ci-openshift-release-master-ci-4.16-e2e-external-aws/1850326950212538368

Reported to Slack channel https://redhat-internal.slack.com/archives/C064717U0D7/p1729990830715289

is duplicated by

SPLAT-2021 [pext/aws][CI] collect bootstrap logs when step fails to complete

Closed

is related to

SPLAT-2021 [pext/aws][CI] collect bootstrap logs when step fails to complete

Closed

Assignee:: Marco Braga

Reporter:: Marco Braga

Need Info From:: None

Contributors:: None

QA Contact:: None

Doc Contact:: None

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Created:: 2024/10/29 3:06 AM

Updated:: 2025/06/27 8:50 AM

Resolved:: 2025/04/07 7:48 PM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates