Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-64731

False FailedVSphereCheckComputeClusterPermissions Alert Reported by vsphere-problem-detector-operator in OpenShift 4.18.26 Despite Correct Permissions on Existing Resource Pool

XMLWordPrintable

    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • None
    • Moderate
    • No
    • x86_64
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Description of problem:

      Version-Release number of selected component (if applicable):

      The customer installed OpenShift Container Platform 4.18.26 on VMware vSphere (8.0.3) using the IPI method with a pre-existing Resource Pool (not directly on the cluster root). 

      The installation completed successfully, but the alert vSphereOpenshiftNodeHealthFail was triggered. The vsphere-problem-detector-operator reports missing privileges on the compute cluster:

       

      5m          Warning   FailedVSphereCheckComputeClusterPermissions    deployment/vsphere-problem-detector-operator   missing privileges for compute cluster INTEL-FARM: Resource.AssignVMToPool, VApp.AssignResourcePool, VApp.Import, VirtualMachine.Config.AddNewDisk

      According to Red Hat documentation, when using an existing Resource Pool, the vSphere cluster should only require ReadOnly permissions — the listed privileges are only required on the Resource Pool, not at the cluster level.

       

      Customer is following this official Red Hat doc: https://docs.redhat.com/en/documentation/openshift_container_platform/4.16/html/installing_on_vsphere/installer-provisioned-infrastructure#ins[...]eqs

      As cu also sent me govc command output :

      Cluster:

       

      > govc permissions.ls /UH/host/<CLUSTER_NAME> | Select-String "Propagate|OpenShift_User"
      
      Role                               Entity               Principal                                                                Propagate
      ReadOnly                           /UH/host/<CLUSTER_NAME>  <DOMAIN>\OpenShift_User                                                 Yes
      
      > govc role.ls ReadOnly
      System.Anonymous
      System.Read
      System.View 

       

       
      Resource Pool: 

      > govc permissions.ls /UH/host/<CLUSTER_NAME>/Resources/<RESOURCE_POOL_NAME> | Select-String "Propagate|OpenShift_User"
      
      Role                                   Entity                                           Principal                                                                Propagate
      Role_OpenShift_ResourcePool-propagate  /UH/host/<CLUSTER_NAME>/Resources/<RESOURCE_POOL_NAME>  <DOMAIN>\OpenShift_User                                                 Yes
      > govc role.ls Role_OpenShift_ResourcePool-propagate
      Host.Config.Storage
      Resource.AssignVMToPool
      System.Anonymous
      System.Read
      System.View
      VApp.AssignResourcePool
      VApp.Import
      VirtualMachine.Config.AddNewDisk 

      And customer has repeatedly highlighted that, The customer reviewed the source code of the vsphere-problem-detector : https://github.com/openshift/vsphere-problem-detector/blob/1a2036121a48da7493bdb50913c2b71c2372be60/pkg/check/node_cluster_permissions.go#L60-L63

      // If pre-existing resource pool was defined, only check cluster for read privilege
      	if ctx.VMConfig.LegacyConfig != nil && ctx.VMConfig.LegacyConfig.Workspace.ResourcePoolPath != "" {
      		readOnly = true
      	} 

      As per the customer's observation, in the vsphere-problem-detector code, the logic comment states: "If a pre-existing resource pool was defined, only check the cluster for read privileges." However, the actual check relies on the VMConfig.LegacyConfig field, which is no longer used in fresh installations. This is also indicated by the operator log message: 

      Unable to load cloud config as legacy ini. 1:1: expected section header. 

       Because of this, the readOnly variable is never set to true, causing the operator to skip the intended read-only check and instead (incorrectly) require additional cluster-level privileges.
       

      How reproducible:

      Install OpenShift 4.18.26 on vSphere using the IPI method with a pre-existing Resource Pool (not cluster root).
      Assign ReadOnly permissions to the cluster and required privileges to the Resource Pool as per Red Hat docs.
      After installation, observe the alert:
      FailedVSphereCheckComputeClusterPermissions
      showing missing privileges on the cluster level.
      Based on analysis, the behavior appears to stem from the vsphere-problem-detector logic depending on VMConfig.LegacyConfig, which is no longer used in fresh installations.

              jcallen@redhat.com Joseph Callen
              rhn-support-aanasali Anas Ali
              None
              None
              Shang Gao Shang Gao
              None
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated: