-
Bug
-
Resolution: Unresolved
-
Major
-
None
-
4.18.z
-
Quality / Stability / Reliability
-
False
-
-
None
-
Moderate
-
No
-
x86_64
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
Description of problem:
Version-Release number of selected component (if applicable):
The customer installed OpenShift Container Platform 4.18.26 on VMware vSphere (8.0.3) using the IPI method with a pre-existing Resource Pool (not directly on the cluster root).
The installation completed successfully, but the alert vSphereOpenshiftNodeHealthFail was triggered. The vsphere-problem-detector-operator reports missing privileges on the compute cluster:
5m Warning FailedVSphereCheckComputeClusterPermissions deployment/vsphere-problem-detector-operator missing privileges for compute cluster INTEL-FARM: Resource.AssignVMToPool, VApp.AssignResourcePool, VApp.Import, VirtualMachine.Config.AddNewDisk
According to Red Hat documentation, when using an existing Resource Pool, the vSphere cluster should only require ReadOnly permissions — the listed privileges are only required on the Resource Pool, not at the cluster level.
Customer is following this official Red Hat doc: https://docs.redhat.com/en/documentation/openshift_container_platform/4.16/html/installing_on_vsphere/installer-provisioned-infrastructure#ins[...]eqs
As cu also sent me govc command output :
Cluster:
> govc permissions.ls /UH/host/<CLUSTER_NAME> | Select-String "Propagate|OpenShift_User" Role Entity Principal Propagate ReadOnly /UH/host/<CLUSTER_NAME> <DOMAIN>\OpenShift_User Yes > govc role.ls ReadOnly System.Anonymous System.Read System.View
Resource Pool:
> govc permissions.ls /UH/host/<CLUSTER_NAME>/Resources/<RESOURCE_POOL_NAME> | Select-String "Propagate|OpenShift_User" Role Entity Principal Propagate Role_OpenShift_ResourcePool-propagate /UH/host/<CLUSTER_NAME>/Resources/<RESOURCE_POOL_NAME> <DOMAIN>\OpenShift_User Yes > govc role.ls Role_OpenShift_ResourcePool-propagate Host.Config.Storage Resource.AssignVMToPool System.Anonymous System.Read System.View VApp.AssignResourcePool VApp.Import VirtualMachine.Config.AddNewDisk
And customer has repeatedly highlighted that, The customer reviewed the source code of the vsphere-problem-detector : https://github.com/openshift/vsphere-problem-detector/blob/1a2036121a48da7493bdb50913c2b71c2372be60/pkg/check/node_cluster_permissions.go#L60-L63
// If pre-existing resource pool was defined, only check cluster for read privilege if ctx.VMConfig.LegacyConfig != nil && ctx.VMConfig.LegacyConfig.Workspace.ResourcePoolPath != "" { readOnly = true }
As per the customer's observation, in the vsphere-problem-detector code, the logic comment states: "If a pre-existing resource pool was defined, only check the cluster for read privileges." However, the actual check relies on the VMConfig.LegacyConfig field, which is no longer used in fresh installations. This is also indicated by the operator log message:
Unable to load cloud config as legacy ini. 1:1: expected section header.
Because of this, the readOnly variable is never set to true, causing the operator to skip the intended read-only check and instead (incorrectly) require additional cluster-level privileges.
How reproducible:
Install OpenShift 4.18.26 on vSphere using the IPI method with a pre-existing Resource Pool (not cluster root). Assign ReadOnly permissions to the cluster and required privileges to the Resource Pool as per Red Hat docs. After installation, observe the alert: FailedVSphereCheckComputeClusterPermissions showing missing privileges on the cluster level. Based on analysis, the behavior appears to stem from the vsphere-problem-detector logic depending on VMConfig.LegacyConfig, which is no longer used in fresh installations.