-
Epic
-
Resolution: Done
-
Normal
-
None
-
None
-
None
-
Allow re-execution of Ansible jobs against already deployed Nodes
-
False
-
-
False
-
Committed
-
Committed
-
To Do
-
Proposed
-
Proposed
-
0% To Do, 0% In Progress, 100% Done
-
-
-
2024Q1
Our current workflow necessitates that users delete and recreate the `OpenStackDataPlaneNodeSet` CR for any configuration changes. While this works for PreProvisioned nodes, it doesn't work for baremetal. Since the act of deleting the NodeSet will also delete the baremetal node from Metal3.
Thus, it is necessary for us to provide users with the ability to make changes to their nodes, such as network configuration changes (MTU, configure additional networks, gateways, DNS servers, etc), or any other service configuration that might need to be changed as part of a day 2 operation.
Issues with current implementation
When we have a node deployed:
NAME STATE CONSUMER ONLINE ERROR AGE
compute-01 provisioned openstack-edpm-ipam true 6m20s
Then we delete the nodeSet to make changes:
[m3@localhost ~]$ oc delete osdpns --all
openstackdataplanenodeset.dataplane.openstack.org "openstack-edpm-ipam" deleted
We can see the related baremetal node is also removed:
[m3@localhost ~]$ oc get bmh -n openshift-machine-api compute-01
NAME STATE CONSUMER ONLINE ERROR AGE
compute-01 deprovisioning false 6m54s
Even if we blocked the deletion of the `BareMetalHost` when the NodeSet is deleted, we will then be putting the burden back on the user to configure this as a now PreProvisioned node.
Proposal
While I understand the original idea of designing something consistent with Ansible. We should instead maintain consistency with Kubernetes constructs, since the primary user interface is via Kubernetes and not via Ansible. Users are never expected to interact directly with Ansible, so it makes more sense if our design is consistent with Kubernetes constructs and design patterns rather than trying to make Kubernetes function more like Ansible.
To address this issue, I propose we do the following:
- We implement a `MutatingWebhookValidation` that prevents updates to the `baremetalSetTemplate`. This will ensure no changes are made to the `openstackbaremetalset`.
- We decide whether we want to re-execute all `services` for any changes that are made. If we only want to execute a sub-set of the service playbooks, then we will need a mechanism to determine which services should only run during the initial configuration. Maybe a `runOnce` field on each service:
apiVersion: dataplane.openstack.org/v1beta1 kind: OpenStackDataPlaneService metadata: labels: app.kubernetes.io/name: openstackdataplaneservice app.kubernetes.io/instance: openstackdataplaneservice-bootstrap app.kubernetes.io/part-of: dataplane-operator app.kubernetes.io/managed-by: kustomize app.kubernetes.io/created-by: dataplane-operator name: bootstrap spec: label: dataplane-deployment-bootstrap playbook: osp.edpm.bootstrap runOnce: true
Then some code to only run playbooks without `service.Spec.RunOnce` defined if the `NodeSet` is already deployed.
Alternative
The alternative to this would be to remove the Baremetal Provisioning from the dataplane-operator and have the users manually configure their baremetal nodes, then provide them to us as pre-provisioned nodes that we can then configure.