-
Bug
-
Resolution: Duplicate
-
Major
-
None
-
4.14
-
None
-
Important
-
No
-
False
-
Description of problem:
After an in-place upgrade of an instance in a hosted cluster, the instance does not come up after the reboot initiated by the upgrade.
Version-Release number of selected component (if applicable):
4.14
How reproducible:
Only the first node upgrade in a nodepool using the HyperShift e2e tests
Steps to Reproduce:
1. Setup a management cluster with the 4.14/main version of the HyperShift operator. 2. Run the in-place node upgrade test: bin/test-e2e \ -test.v \ -test.timeout=2h10m \ -test.run=TestInPlaceUpgradeNodePool \ --e2e.aws-credentials-file=$HOME/.aws/credentials \ --e2e.aws-region=us-east-2 \ --e2e.aws-zones=us-east-2a \ --e2e.pull-secret-file=$HOME/.pull-secret \ --e2e.base-domain=www.mydomain.com \ --e2e.latest-release-image="registry.ci.openshift.org/ocp/release:4.14.0-0.ci-2023-03-13-110647" \ --e2e.previous-release-image="registry.ci.openshift.org/ocp/release:4.14.0-0.ci-2023-03-08-170640" \ --e2e.skip-api-budget \ --e2e.aws-endpoint-access=PublicAndPrivate
Actual results:
The test fails waiting for a node to upgrade
Expected results:
The test succeeds
Additional info:
The upgrade successfully starts and the mcd pod runs to completion up to the point of restarting the instance. The instance does not come back up after restart. If the node upgrade is unstuck by deleting the node and the corresponding ec2 instance, the upgrade actually succeeds in the other node of the nodepool (assuming a 2-count nodepool). Attached is the instance log after restart, and the log of the mcd pod. Example of failed run: https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_hypershift/2271/pull-ci-openshift-hypershift-main-e2e-aws/1634291913915895808
- relates to
-
COS-1926 Move RHCOS to RHEL 9.2 in OCP 4.13
- Closed