Uploaded image for project: 'OpenShift Virtualization'
  1. OpenShift Virtualization
  2. CNV-50027

Conflicting logic for SNO regarding scaling of components and monitoring.

XMLWordPrintable

    • CNV I/U Operators Sprint 262, CNV I/U Operators Sprint 263, CNV I/U Operators Sprint 264, CNV I/U Operators Sprint 265, CNV I/U Operators Sprint 266, CNV I/U Operators Sprint 267
    • Moderate
    • None

      Description of problem:

      
      Take a single node cluster, the HCO will see the control plane is not highly available and set the kubevirt object with "infra: replicas: 1" as per [1]. That results in deployments with single replica for virt-api, 1 virt-controller etc as per [2].
      
      Now look at the monitoring logic to fire alerts in [3], it doesn't look at the control pane availability of the cluster, it looks at the number of worker nodes and decides to fire the alerts if each pod has 1 replica only and the cluster has 2+ worker nodes.
      
      This is conflicting logic, one side decides to scale or not based on X (control plane high availability), and the other complains or not based on Y (number of nodes).
      
      Take a SNO cluster and add a single worker and this discrepancy will show up, as the control plane is not highly available but there are 2+ workers.
      
      Please investigate what is the correct action here, if these alerts should not fire or if the components should start scaling.
      
      [1] https://github.com/kubevirt/hyperconverged-cluster-operator/blob/cf24cf1dcc5fadf5ee53ed0fee7c005e42d6e66c/controllers/operands/kubevirt.go#L809
      [2] https://github.com/kubevirt/kubevirt/blob/133430eab48bb567535f8556356e3b071b275388/pkg/virt-operator/resource/apply/apps.go#L56
      [3] https://github.com/kubevirt/kubevirt/blob/15090de920b2345df8f56eb70a3d6ecc64d60992/pkg/monitoring/rules/alerts/virt-controller.go#L64
      
      

      Version-Release number of selected component (if applicable):

      4.16.4
      

      How reproducible:

      Always
      

      Steps to Reproduce:

      1. Install SNO cluster
      2. Add one extra worker node.
      

      Actual results:

      Alerts like LowVirtAPICount and LowVirtControllersCount  firing 
      

      Expected results:

      Either starts scaling or don't complain that its not scaling.
      

              ocohen@redhat.com Oren Cohen
              rhn-support-gveitmic Germano Veit Michel
              Germano Veit Michel Germano Veit Michel
              Votes:
              2 Vote for this issue
              Watchers:
              14 Start watching this issue

                Created:
                Updated:
                Resolved: