-
Task
-
Resolution: Unresolved
-
Normal
-
None
-
None
-
Proactive Architecture
-
3
-
False
-
None
-
False
-
-
-
3
As an OCP Engineer, I would like to know the reason bootstrap failed to install on Platform External clusters on AWS when API is is down on step "platform-external-cluster-wait-for-api-bootstrap" by collecting logs through SSH, so I will have more information while investigating the root cause of failures.
Acceptance criteria:
- Attempt to collect logs from systemd through SSH in the step [1] when the API is down
- Show preview of logs (tail?) before exit with failure
- Review, and adjust if needed, the timeout (the jobs[2] took like 20+ minutes to get feedback)
Engineering references:
[1] step failure e2e-external-aws-ccm-platform-external-cluster-wait-for-api-bootstrap
~~~
E1027 00:49:55.231610 1000 memcache.go:265] couldn't get current server API group list: Get "https://api.ci-op-stflmg87-cd183.origin-ci-int-aws.dev.rhcloud.com:6443/api?timeout=32s": dial tcp 3.23.214.125:6443: connect: connection refused
386
The connection to the server api.ci-op-stflmg87-cd183.origin-ci-int-aws.dev.rhcloud.com:6443 was refused - did you specify the right host or port?
387
2024-10-27 00:49:55+00:00 - API DOWN, waiting 30s...
388
390
error: failed to execute wrapped command: exit status 127
391
INFO[2024-10-27T00:50:05Z] Step e2e-external-aws-ccm-platform-external-cluster-wait-for-api-bootstrap failed after 25m10s.
~~~
[2] References / job failures:
Reported to Slack channel https://redhat-internal.slack.com/archives/C064717U0D7/p1729990830715289