-
Bug
-
Resolution: Done-Errata
-
Major
-
4.14
-
None
-
Moderate
-
No
-
3
-
WINC - Sprint 242, WINC - Sprint 243
-
2
-
False
-
-
-
Bug Fix
Description of problem:
WMCO does not properly deduce when a reboot is complete. Currently, it tries to initialize an SSH connection directly after issuing the reboot request -- this can lead to timing issues where the reboot hasn't occurred yet and the initial SSH connection is still active, so WMCO thinks the reboot is done and proceeds with node configuration. Then, a bit later, the reboot actually is underway and WMCO errors out and has to re-init SSH and restart configuration.
Version-Release number of selected component (if applicable):
4.14 (and below through 4.10)
How reproducible:
Always
Steps to Reproduce:
1. Use a Windows image that does not have the Containers feature enabled already 2. Have WMCO try to configure the instance as a node 3. Timing error will show when restarting instance after turning on containers feature
Actual results:
WMCO's check if the instance is reachable via SSH is too quick and incorrectly assumes the reboot has been completed right away, which leads to configuration failure later as it can not run powershell commands over SSH when reboot is underway.
Expected results:
WMCO should wait/check for reboot in a more complete manner to avoid false positives.
Additional info:
Perhaps waiting for the node to be unreachable first, and then waiting for it to be reachable again could solve this?
Thread with logs and discussion: https://redhat-internal.slack.com/archives/CM4ERHBJS/p1690925841359849
- blocks
-
OCPBUGS-18554 error removing %s HNS network when cleaning up BYOH proxy nodes
- Closed
-
OCPBUGS-20067 WMCO does not wait for instance to reboot properly
- Closed
- is blocked by
-
OCPBUGS-19502 Enable proxy removal test in CI
- Closed
- is cloned by
-
OCPBUGS-20067 WMCO does not wait for instance to reboot properly
- Closed
- links to
-
RHBA-2023:120235 Red Hat OpenShift support for Windows Containers 10.15.0 product release
- mentioned on