Loading...

XML

Word

Printable

Type: Bug
Resolution: Duplicate
Priority: Normal
Fix Version/s: None
Affects Version/s: 4.13
Component/s: Cloud Compute / Cloud Controller Manager
Labels:
None

Regression:
No
Blocked:
False
Blocked Reason:

Hide

None

Show
None

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Description of problem:

We can migrate a cluster created with a single arch payload to a multi payload using the `oc adm upgrade --to-multi-arch` command. While the migration is happening, if we simultaneously provision a machineset (with the appropriate arch specific bootimage) of a different architecture (differing from the control plane arch), the machine stays in provisioned state for ever and no node is created.

This is due to the fact that the machine does get created and boots up, but when MCO pivots to the machine-os-content, it pivots to the single arch machine-os because the upgrade is not complete yet.

While this case is indeed rare, it would be great if somehow this error could be propagated out of the machine and it would transition to a failed state

Version-Release number of selected component (if applicable):

4.13.0-ec.4

How reproducible:

always

Steps to Reproduce:

1.Create an AWS amd64 cluster with a single arch 4.13.0-ec.4 payload 
2. Execute the migration command `oc adm upgrade --to-multi-arch`
3. Provision a machineset with the arm64 bootimage and instance type
4. monitor with `oc get machines -n openshift-machine-api`

Actual results:

machine stays in provisioned state

Expected results:

machine transitions to failed state

Additional info:

logs from machine-api-controller:
I0314 23:48:15.684256       1 reconciler.go:407] psundara-mycluster01-8mrjn-worker-us-east-1c-m5nlr: ProviderID already set in the machine Spec with value:aws:///us-east-1c/i-0c0d4fb0fe65f651d
I0314 23:48:15.684315       1 reconciler.go:267] Updated machine psundara-mycluster01-8mrjn-worker-us-east-1c-m5nlr
I0314 23:48:15.684323       1 machine_scope.go:167] psundara-mycluster01-8mrjn-worker-us-east-1c-m5nlr: Updating status
I0314 23:48:15.761009       1 machine_scope.go:193] psundara-mycluster01-8mrjn-worker-us-east-1c-m5nlr: finished calculating AWS status
I0314 23:48:15.761027       1 machine_scope.go:90] psundara-mycluster01-8mrjn-worker-us-east-1c-m5nlr: patching machine
I0314 23:48:15.778043       1 controller.go:341] psundara-mycluster01-8mrjn-worker-us-east-1c-m5nlr: has no node yet, requeuing

relates to

OTA-961 Prepare for Cluster version status to report transitions from single arch to multi arch correctly

Dev Complete

Assignee:: Joel Speed

Reporter:: Prashanth Sundararaman

QA Contact:: Zhaohua Sun

Votes:: 0 Vote for this issue

Watchers:: 7 Start watching this issue

Created:: 2023/03/14 11:53 PM

Updated:: 2023/10/17 5:32 PM

Resolved:: 2023/10/17 5:32 PM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates