Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-52955

SriovNetworkNodePolicy reconciler runs too often

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done-Errata
    • Icon: Undefined Undefined
    • None
    • 4.18.0
    • Networking / SR-IOV
    • None
    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • None
    • None
    • None
    • None
    • CNF Network Sprint 268, CNF Network Sprint 269
    • 2
    • Done
    • Bug Fix
    • Hide
      * Previously, the `SriovNetworkNodePolicy` object reconciler executed with every node resource update. This resulted in excessive resource consumption by the SR-IOV Operator pod and an overabundance of log entries. This release changes the behavior so that the reconciler only runs when a node label changes, thereby reducing resource consumption and log entry generation. (link:https://issues.redhat.com/browse/OCPBUGS-52955[OCPBUGS-52955])
      Show
      * Previously, the `SriovNetworkNodePolicy` object reconciler executed with every node resource update. This resulted in excessive resource consumption by the SR-IOV Operator pod and an overabundance of log entries. This release changes the behavior so that the reconciler only runs when a node label changes, thereby reducing resource consumption and log entry generation. (link: https://issues.redhat.com/browse/OCPBUGS-52955 [ OCPBUGS-52955 ])
    • None
    • None
    • None
    • None

      Description of problem:

      
      Reconcile loop runs on every node update notification, which is not necessary and can lead to a large number of produced log lines per hour, even if the operator is idle.
      
      E.g. after 1h of idle time:
      
      $ grep "INFO     SriovNetworkNodePolicy  handler/eventhandler.go:97      Enqueuing sync for create event" openshift-sriov-network-operator-pods_logs.log| wc -l
      526
      
          

      Version-Release number of selected component (if applicable):

      4.18
          

      How reproducible:

      100%
          

      Steps to Reproduce:

          1. Install the operator 
      
      cat << EOF | oc create -f -
      apiVersion: v1
      kind: Namespace
      metadata:
        name: openshift-sriov-network-operator
        annotations:
          workload.openshift.io/allowed: management
        labels:
          pod-security.kubernetes.io/audit: privileged
          pod-security.kubernetes.io/enforce: privileged
          pod-security.kubernetes.io/warn: privileged
      EOF
      
      operator-sdk run bundle \
          registry.redhat.io/openshift4/ose-sriov-network-operator-bundle:v4.18.0.202502260933.p0.gf496851.assembly.stream.el9 \
          --namespace openshift-sriov-network-operator
          2. Configure the SriovOperatorConfig with default values
      cat <<EOF | oc create -f -
      apiVersion: sriovnetwork.openshift.io/v1
      kind: SriovOperatorConfig
      metadata:
        name: default
        namespace: openshift-sriov-network-operator
      spec:
        enableInjector: true
        enableOperatorWebhook: true
        logLevel: 2
        disableDrain: false
      EOF
      
          3. Wait few minutes
          
          4. Check how many times the SriovNetworkNodePolicy reconcile is called
      $ oc get  -n openshift-sriov-network-operator pods | grep sriov-network-operator; oc logs -n openshift-sriov-network-operator deploy/sriov-network-operator | grep 'Reconciling.*{"controller": "sriovnetworknodepolicy"' | wc -l      
      sriov-network-operator-86595c68db-4mlbb                           1/1     Running     0          11m
      90
      
      
          

      Actual results:

      Reconcile loop called every ~7s
          

      Expected results:

      Reconcile loop called every 5 minutes (resync period)
          

      Additional info:

      
          

              apanatto@redhat.com Andrea Panattoni
              apanatto@redhat.com Andrea Panattoni
              None
              None
              Evgeny Levin Evgeny Levin
              None
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Created:
                Updated:
                Resolved: