Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-43433

[vsphere] Machine stuck in Provisioning status when machine is power off

XMLWordPrintable

    • Moderate
    • None
    • False
    • Hide

      None

      Show
      None
    • Hide
      * Previously, a machine controller failed to save the {vmw-full} task ID of an instance template clone operation. This caused the machine to go into the `Provisioning` state and to power off. With this release, the {vmw-full} machine controller can detect and recover from this state. (link:https://issues.redhat.com/browse/OCPBUGS-43433[*OCPBUGS-43433*])
      Show
      * Previously, a machine controller failed to save the {vmw-full} task ID of an instance template clone operation. This caused the machine to go into the `Provisioning` state and to power off. With this release, the {vmw-full} machine controller can detect and recover from this state. (link: https://issues.redhat.com/browse/OCPBUGS-43433 [* OCPBUGS-43433 *])
    • Bug Fix
    • Done

      This is a clone of issue OCPBUGS-1735. The following is the description of the original issue:

      Description of problem:

      When setting up cluster on vsphere, sometimes machine is powered off and in "Provisioning" phase, it will trigger a new machine creation, and report error "failed to Create machine: The name 'jima-ipi-27-d97wp-worker-7qn9b' already exists"

      Version-Release number of selected component (if applicable):

       4.12.0-0.ci.test-2022-09-26-235306-ci-ln-vh4qjyk-latest

      How reproducible:

      Sometimes, met two times

      Steps to Reproduce:

      1. Setup a vsphere cluster
      2.
      3.
      

      Actual results:

      Cluster installation failed, machine stuck in Provisioning status. 
      $ oc get machine                      
      NAME                             PHASE          TYPE   REGION   ZONE   AGE
      jima-ipi-27-d97wp-master-0       Running                               4h
      jima-ipi-27-d97wp-master-1       Running                               4h
      jima-ipi-27-d97wp-master-2       Running                               4h
      jima-ipi-27-d97wp-worker-7qn9b   Provisioning                          3h56m
      jima-ipi-27-d97wp-worker-dsqd2   Running                               3h56m
      
      $ oc edit machine jima-ipi-27-d97wp-worker-7qn9b
      status:
        conditions:
        - lastTransitionTime: "2022-09-27T01:27:29Z"
          status: "True"
          type: Drainable
        - lastTransitionTime: "2022-09-27T01:27:29Z"
          message: Instance has not been created
          reason: InstanceNotCreated
          severity: Warning
          status: "False"
          type: InstanceExists
        - lastTransitionTime: "2022-09-27T01:27:29Z"
          status: "True"
          type: Terminable
        lastUpdated: "2022-09-27T01:27:29Z"
        phase: Provisioning
        providerStatus:
          conditions:
          - lastTransitionTime: "2022-09-27T01:36:09Z"
            message: The name 'jima-ipi-27-d97wp-worker-7qn9b' already exists.
            reason: MachineCreationSucceeded
            status: "False"
            type: MachineCreation
          taskRef: task-11363480
      
      $ govc vm.info /SDDC-Datacenter/vm/jima-ipi-27-d97wp/jima-ipi-27-d97wp-worker-7qn9b
      Name:           jima-ipi-27-d97wp-worker-7qn9b
        Path:         /SDDC-Datacenter/vm/jima-ipi-27-d97wp/jima-ipi-27-d97wp-worker-7qn9b
        UUID:         422cb686-6585-f05a-af13-b2acac3da294
        Guest name:   Red Hat Enterprise Linux 8 (64-bit)
        Memory:       16384MB
        CPU:          8 vCPU(s)
        Power state:  poweredOff
        Boot time:    <nil>
        IP address:   
        Host:         10.3.32.8
      
      I0927 01:44:42.568599       1 session.go:91] No existing vCenter session found, creating new session
      I0927 01:44:42.633672       1 session.go:141] Find template by instance uuid: 9535891b-902e-410c-b9bb-e6a57aa6b25a
      I0927 01:44:42.641691       1 reconciler.go:270] jima-ipi-27-d97wp-worker-7qn9b: already exists, but was not powered on after clone, requeue
      I0927 01:44:42.641726       1 controller.go:380] jima-ipi-27-d97wp-worker-7qn9b: reconciling machine triggers idempotent create
      I0927 01:44:42.641732       1 actuator.go:66] jima-ipi-27-d97wp-worker-7qn9b: actuator creating machine
      I0927 01:44:42.659651       1 reconciler.go:935] task: task-11363480, state: error, description-id: VirtualMachine.clone
      I0927 01:44:42.659684       1 reconciler.go:951] jima-ipi-27-d97wp-worker-7qn9b: Updating provider status
      E0927 01:44:42.659696       1 actuator.go:57] jima-ipi-27-d97wp-worker-7qn9b error: jima-ipi-27-d97wp-worker-7qn9b: reconciler failed to Create machine: The name 'jima-ipi-27-d97wp-worker-7qn9b' already exists.
      I0927 01:44:42.659762       1 machine_scope.go:101] jima-ipi-27-d97wp-worker-7qn9b: patching machine
      I0927 01:44:42.660100       1 recorder.go:103] events "msg"="jima-ipi-27-d97wp-worker-7qn9b: reconciler failed to Create machine: The name 'jima-ipi-27-d97wp-worker-7qn9b' already exists." "object"={"kind":"Machine","namespace":"openshift-machine-api","name":"jima-ipi-27-d97wp-worker-7qn9b","uid":"9535891b-902e-410c-b9bb-e6a57aa6b25a","apiVersion":"machine.openshift.io/v1beta1","resourceVersion":"17614"} "reason"="FailedCreate" "type"="Warning"
      W0927 01:44:42.688562       1 controller.go:382] jima-ipi-27-d97wp-worker-7qn9b: failed to create machine: jima-ipi-27-d97wp-worker-7qn9b: reconciler failed to Create machine: The name 'jima-ipi-27-d97wp-worker-7qn9b' already exists.
      E0927 01:44:42.688651       1 controller.go:326]  "msg"="Reconciler error" "error"="jima-ipi-27-d97wp-worker-7qn9b: reconciler failed to Create machine: The name 'jima-ipi-27-d97wp-worker-7qn9b' already exists." "controller"="machine-controller" "name"="jima-ipi-27-d97wp-worker-7qn9b" "namespace"="openshift-machine-api" "object"={"name":"jima-ipi-27-d97wp-worker-7qn9b","namespace":"openshift-machine-api"} "reconcileID"="d765f02c-bd54-4e6c-88a4-c578f16c7149"
      ...
      I0927 03:18:45.118110       1 actuator.go:66] jima-ipi-27-d97wp-worker-7qn9b: actuator creating machine
      E0927 03:18:45.131676       1 actuator.go:57] jima-ipi-27-d97wp-worker-7qn9b error: jima-ipi-27-d97wp-worker-7qn9b: reconciler failed to Create machine: ServerFaultCode: The object 'vim.Task:task-11363480' has already been deleted or has not been completely created
      I0927 03:18:45.131725       1 machine_scope.go:101] jima-ipi-27-d97wp-worker-7qn9b: patching machine
      I0927 03:18:45.131873       1 recorder.go:103] events "msg"="jima-ipi-27-d97wp-worker-7qn9b: reconciler failed to Create machine: ServerFaultCode: The object 'vim.Task:task-11363480' has already been deleted or has not been completely created" "object"={"kind":"Machine","namespace":"openshift-machine-api","name":"jima-ipi-27-d97wp-worker-7qn9b","uid":"9535891b-902e-410c-b9bb-e6a57aa6b25a","apiVersion":"machine.openshift.io/v1beta1","resourceVersion":"17614"} "reason"="FailedCreate" "type"="Warning"
      W0927 03:18:45.150393       1 controller.go:382] jima-ipi-27-d97wp-worker-7qn9b: failed to create machine: jima-ipi-27-d97wp-worker-7qn9b: reconciler failed to Create machine: ServerFaultCode: The object 'vim.Task:task-11363480' has already been deleted or has not been completely created
      E0927 03:18:45.150492       1 controller.go:326]  "msg"="Reconciler error" "error"="jima-ipi-27-d97wp-worker-7qn9b: reconciler failed to Create machine: ServerFaultCode: The object 'vim.Task:task-11363480' has already been deleted or has not been completely created" "controller"="machine-controller" "name"="jima-ipi-27-d97wp-worker-7qn9b" "namespace"="openshift-machine-api" "object"={"name":"jima-ipi-27-d97wp-worker-7qn9b","namespace":"openshift-machine-api"} "reconcileID"="5d92bc1d-2f0d-4a0b-bb20-7f2c7a2cb5af"
      I0927 03:18:45.150543       1 controller.go:187] jima-ipi-27-d97wp-worker-dsqd2: reconciling Machine
      
      
      

      Expected results:

      Machine is created successfully.

      Additional info:

      machine-controller log: http://file.rdu.redhat.com/~zhsun/machine-controller.log

              joelspeed Joel Speed
              openshift-crt-jira-prow OpenShift Prow Bot
              Milind Yadav Milind Yadav
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

                Created:
                Updated:
                Resolved: