Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-10300

Machine created during migration from single->multi arch payload stays in Provisioned state

XMLWordPrintable

    • No
    • False
    • Hide

      None

      Show
      None

      Description of problem:

      We can migrate a cluster created with a single arch payload to a multi payload using the `oc adm upgrade --to-multi-arch` command. While the migration is happening, if we simultaneously provision a machineset (with the appropriate arch specific bootimage) of a different architecture (differing from the control plane arch), the machine stays in provisioned state for ever and no node is created.
      
      This is due to the fact that the machine does get created and boots up, but when MCO pivots to the machine-os-content, it pivots to the single arch machine-os because the upgrade is not complete yet.
      
      While this case is indeed rare, it would be great if somehow this error could be propagated out of the machine and it would transition to a failed state 

      Version-Release number of selected component (if applicable):

      4.13.0-ec.4

      How reproducible:

      always

      Steps to Reproduce:

      1.Create an AWS amd64 cluster with a single arch 4.13.0-ec.4 payload 
      2. Execute the migration command `oc adm upgrade --to-multi-arch`
      3. Provision a machineset with the arm64 bootimage and instance type
      4. monitor with `oc get machines -n openshift-machine-api`

      Actual results:

      machine stays in provisioned state

      Expected results:

      machine transitions to failed state

      Additional info:

      logs from machine-api-controller:
      I0314 23:48:15.684256       1 reconciler.go:407] psundara-mycluster01-8mrjn-worker-us-east-1c-m5nlr: ProviderID already set in the machine Spec with value:aws:///us-east-1c/i-0c0d4fb0fe65f651d
      I0314 23:48:15.684315       1 reconciler.go:267] Updated machine psundara-mycluster01-8mrjn-worker-us-east-1c-m5nlr
      I0314 23:48:15.684323       1 machine_scope.go:167] psundara-mycluster01-8mrjn-worker-us-east-1c-m5nlr: Updating status
      I0314 23:48:15.761009       1 machine_scope.go:193] psundara-mycluster01-8mrjn-worker-us-east-1c-m5nlr: finished calculating AWS status
      I0314 23:48:15.761027       1 machine_scope.go:90] psundara-mycluster01-8mrjn-worker-us-east-1c-m5nlr: patching machine
      I0314 23:48:15.778043       1 controller.go:341] psundara-mycluster01-8mrjn-worker-us-east-1c-m5nlr: has no node yet, requeuing
      

              joelspeed Joel Speed
              psundara@redhat.com Prashanth Sundararaman
              Zhaohua Sun Zhaohua Sun
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

                Created:
                Updated:
                Resolved: