Loading...

XML

Word

Printable

Type: Bug
Resolution: Duplicate
Priority: Major
Fix Version/s: None
Affects Version/s: 4.14
Component/s: Networking / ovn-kubernetes
Labels:
None

Severity:
Important
Regression:
No
Blocked:
False
Blocked Reason:

Hide

None

Show
None

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Description of problem:

After an in-place upgrade of an instance in a hosted cluster, the instance does not come up after the reboot initiated by the upgrade.

Version-Release number of selected component (if applicable):

4.14

How reproducible:

Only the first node upgrade in a nodepool using the HyperShift e2e tests

Steps to Reproduce:

1. Setup a management cluster with the 4.14/main version of the HyperShift operator.
2. Run the in-place node upgrade test:
bin/test-e2e \
  -test.v \
  -test.timeout=2h10m \
  -test.run=TestInPlaceUpgradeNodePool \
  --e2e.aws-credentials-file=$HOME/.aws/credentials \
  --e2e.aws-region=us-east-2 \
  --e2e.aws-zones=us-east-2a \
  --e2e.pull-secret-file=$HOME/.pull-secret \
  --e2e.base-domain=www.mydomain.com \
  --e2e.latest-release-image="registry.ci.openshift.org/ocp/release:4.14.0-0.ci-2023-03-13-110647" \
  --e2e.previous-release-image="registry.ci.openshift.org/ocp/release:4.14.0-0.ci-2023-03-08-170640" \
  --e2e.skip-api-budget \
  --e2e.aws-endpoint-access=PublicAndPrivate

Actual results:

The test fails waiting for a node to upgrade

Expected results:

The test succeeds

Additional info:

The upgrade successfully starts and the mcd pod runs to completion up to the point of restarting the instance. The instance does not come back up after restart.

If the node upgrade is unstuck by deleting the node and the corresponding ec2 instance, the upgrade actually succeeds in the other node of the nodepool (assuming a 2-count nodepool).

Attached is the instance log after restart, and the log of the mcd pod.

Example of failed run: 
https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_hypershift/2271/pull-ci-openshift-hypershift-main-e2e-aws/1634291913915895808

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

instance.log
64 kB
2023/03/14 1:28 AM
mcd-pod.log
102 kB
2023/03/14 1:29 AM

relates to

COS-1926 Move RHCOS to RHEL 9.2 in OCP 4.13

Closed

Assignee:: Martin Kennelly

Reporter:: Cesar Wong

QA Contact:: Rio Liu

Votes:: 0 Vote for this issue

Watchers:: 12 Start watching this issue

Created:: 2023/03/14 1:30 AM

Updated:: 2023/04/03 10:19 AM

Resolved:: 2023/04/03 10:18 AM

Details

Description

Attachments

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates