Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-54506

SriovNetworkNodePolicy reconciler runs too often

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done-Errata
    • Icon: Undefined Undefined
    • None
    • 4.18.0
    • Networking / SR-IOV
    • None
    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • None
    • None
    • None
    • None
    • CNF Network Sprint 269
    • 1
    • In Progress
    • Bug Fix
    • Hide
      Before this update, the SriovNetworkNodePolicy reconciler ran on every Node resource update, causing the operator's pod to consume too many resources and producing many log entries. With this update, the reconciler runs on node label change only, as it is the only relevant information.
      Show
      Before this update, the SriovNetworkNodePolicy reconciler ran on every Node resource update, causing the operator's pod to consume too many resources and producing many log entries. With this update, the reconciler runs on node label change only, as it is the only relevant information.
    • None
    • None
    • None
    • None

      This is a clone of issue OCPBUGS-52955. The following is the description of the original issue:

      Description of problem:

      
      Reconcile loop runs on every node update notification, which is not necessary and can lead to a large number of produced log lines per hour, even if the operator is idle.
      
      E.g. after 1h of idle time:
      
      $ grep "INFO     SriovNetworkNodePolicy  handler/eventhandler.go:97      Enqueuing sync for create event" openshift-sriov-network-operator-pods_logs.log| wc -l
      526
      
          

      Version-Release number of selected component (if applicable):

      4.18
          

      How reproducible:

      100%
          

      Steps to Reproduce:

          1. Install the operator 
      
      cat << EOF | oc create -f -
      apiVersion: v1
      kind: Namespace
      metadata:
        name: openshift-sriov-network-operator
        annotations:
          workload.openshift.io/allowed: management
        labels:
          pod-security.kubernetes.io/audit: privileged
          pod-security.kubernetes.io/enforce: privileged
          pod-security.kubernetes.io/warn: privileged
      EOF
      
      operator-sdk run bundle \
          registry.redhat.io/openshift4/ose-sriov-network-operator-bundle:v4.18.0.202502260933.p0.gf496851.assembly.stream.el9 \
          --namespace openshift-sriov-network-operator
          2. Configure the SriovOperatorConfig with default values
      cat <<EOF | oc create -f -
      apiVersion: sriovnetwork.openshift.io/v1
      kind: SriovOperatorConfig
      metadata:
        name: default
        namespace: openshift-sriov-network-operator
      spec:
        enableInjector: true
        enableOperatorWebhook: true
        logLevel: 2
        disableDrain: false
      EOF
      
          3. Wait few minutes
          
          4. Check how many times the SriovNetworkNodePolicy reconcile is called
      $ oc get  -n openshift-sriov-network-operator pods | grep sriov-network-operator; oc logs -n openshift-sriov-network-operator deploy/sriov-network-operator | grep 'Reconciling.*{"controller": "sriovnetworknodepolicy"' | wc -l      
      sriov-network-operator-86595c68db-4mlbb                           1/1     Running     0          11m
      90
      
      
          

      Actual results:

      Reconcile loop called every ~7s
          

      Expected results:

      Reconcile loop called every 5 minutes (resync period)
          

      Additional info:

      
          

              apanatto@redhat.com Andrea Panattoni
              openshift-crt-jira-prow OpenShift Prow Bot
              None
              None
              Evgeny Levin Evgeny Levin
              None
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Created:
                Updated:
                Resolved: