Uploaded image for project: 'Red Hat OpenStack Services on OpenShift'
  1. Red Hat OpenStack Services on OpenShift
  2. OSPRH-12352

BZ#2321311 Report for OSPdO Backup Restore Issue [17.1]

XMLWordPrintable

    • False
    • Hide

      None

      Show
      None
    • False
    • None
    • Release Note Not Required
    • Moderate

      Description of problem:
      Use case:
      in 17.1 OSPdO customer are testing backup and restore when they lose the physical node OCP where controller(virtualized) is running(in this case master of OCP).
      They want test restore in this use case but was not able to restore as the get error:

      Last Hearbeat Time: 2024-10-21T15:34:55Z
      Last Transition Time: 2024-10-21T15:34:55Z
      Message: admission webhook "vopenstackbaremetalset.kb.io" denied the request: unable to find 1 requested BaremetalHost count (0 in use, 0 available) with labels [role:totp-cpt-dpdk6] for OpenStackBaremetalSet totp-cpt-dpdk6
      Reason: admission webhook "vopenstackbaremetalset.kb.io" denied the request: unable to find 1 requested BaremetalHost count (0 in use, 0 available) with labels [role:totp-cpt-dpdk6] for OpenStackBaremetalSet totp-cpt-dpdk6
      Status: True
      Type: Restore Error
      It seems that it is trying to find baremetalhost in available state, while the node are there but in provisioned state , because I've restored the baremetal status,. (we don't want to loose the compute nodes, the backup is just for controller)
      NAMESPACE NAME STATE CONSUMER ONLINE ERROR AGE LABELS
      openshift-machine-api dell01 provisioned totp-cpt-dpdk6 true 4h4m osp-director.openstack.org/controller=osp-baremetalset,osp-director.openstack.org/name=totp-cpt-dpdk6,osp-director.openstack.org/namespace=openstack,osp-director.openstack.org/osphostname=totp-cpt-dpdk6-0,osp-director.openstack.org/uid=c241e4fb-75cd-4096-aa58-aaa87415228c,role=totp-cpt-dpdk6,scope=openstack
      openshift-machine-api totp-master0.nfv.cselt.it unmanaged ocp-totp-bjm7t-master-0 true 4d1h scope=openshift
      openshift-machine-api totp-master1.nfv.cselt.it unmanaged ocp-totp-bjm7t-master-1 true 4d1h scope=openshift
      openshift-machine-api totp-master2.nfv.cselt.it unmanaged ocp-totp-bjm7t-master-2 true 4d1h scope=openshift
      customer attempted to restore an OSPdO backup that included OpenStackBareMetalSet resources.
      The restore process encountered an error and failed to recover the OpenStackBareMetalSet.
      The error message indicated a mismatch between the expected state of BaremetalHost (available) and the actual state (provisioned).

      Seems that the only way (need have a confirmation)is restore everything:control plane and dataplane together.
      Seems a expected behaviour but it i strange as the dataplane with workload shoudn't impacted.

      Investigation:

      customer confirmed that BaremetalHost nodes existed in the cluster and were in "provisioned" state.
      The OpenStackBackup CR referenced the existing BaremetalHost nodes.
      We suspected the issue stemmed from labels on the BaremetalHost nodes being changed after the backup.
      Analysis:

      OSPdO backup/restore functionality prioritizes its own Custom Resources (CRs) and doesn't directly manage underlying resources like BaremetalHost.
      The restore process assumes it needs to provision BaremetalHost nodes based on the OpenStackBareMetalSet spec, even if they already exist.
      Discrepancy between expected and actual BaremetalHost state caused the validation webhook to reject the request.
      Seems that the only way to make a restore

      Version-Release number of selected component (if applicable):

      17.1.2 with OSPdO
      How reproducible:
      simulate OCP hardware fail where controller OSP is running

      Steps to Reproduce:
      1.backup follow official procedure in the documentation and restore combine rear with openshift part

      Actual results:

      Seems the ony way to restore one controller is restore all cluster
      Expected results:

      Additional info:no complete and official reference to make a backup and restore in this scenario

              abays@redhat.com Andrew Bays
              jira-bugzilla-migration RH Bugzilla Integration
              rhos-conplat-core-operators
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated:
                Resolved: