-
Bug
-
Resolution: Done
-
Normal
-
None
-
4.14.z
-
No
-
CNF Compute Sprint 255
-
1
-
False
-
-
Release Note Not Required
-
-
Description of problem:
On a cluster with schedulable control plane nodes, creating a NUMAResourcesOperator resource with the default machineConfigPoolSelector from the documentation: - machineConfigPoolSelector: matchLabels: pools.operator.machineconfiguration.openshift.io/worker: "" A MachineConfig resource named 51-numaresourcesoperator-worker is created, and worker-only nodes are restarted, since the MC is added to the worker MCP. After worker nodes are rebooted, there is one numaresourcesoperator-worker-***** pod for each node, both masters and workesr. However, pods associated to master nodes fail to start, because the rte SELinux policy is not created. If the machineConfigPoolSelector is changed to: - machineConfigPoolSelector: pools.operator.machineconfiguration.openshift.io/worker: "" Or two nodeGroups are created (one for the master MCP and one for the worker MCP). another MachineConfig resource is created, named 51-numaresourcesoperator-master. All master nodes are rebooted, and all numaresourcesoperator-worker-***** pods start successfully after that. However, the operator also creates a set of numaresourcesoperator-master-***** pods, so we have two resource-topology-exporter pods per node, which is an issue for the operator.
Version-Release number of selected component (if applicable):
Seen with Openshift 4.14.26 and NUMA Resources Operator 4.14.5
How reproducible:
Always
Steps to Reproduce:
1. Deploy a cluster with schedulable control plane nodes (for example, a 3+1 compact cluster). 2. Create NUMAResourcesOperator resource with the machineConfigPoolSelector described above
Actual results:
Cannot get a single resource-topology-exporter pod running successfully on control plane nodes
Expected results:
There is a single resource-topology-exporter pod running successfully on control plane nodes
Additional info:
A partner has hit this issue. We have a cluster in the lab where we can reproduce it and run any tests.
- is triggering
-
RFE-5691 Support for schedulable control-plane in NUMAResourcesOperator
- Backlog
- mentioned on