Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-37428

Machine-config operator should not hot loop generating ValidatingAdmissionPolicyUpdated events

XMLWordPrintable

    • Moderate
    • None
    • MCO Sprint 257
    • 1
    • False
    • Hide

      None

      Show
      None
    • Release Note Not Required
    • In Progress

      Description of problem

      Seen in a 4.17 nightly-to-nightly CI update:

      $ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/logs/periodic-ci-openshift-release-master-ci-4.17-e2e-aws-ovn-upgrade/1809154554084724736/artifacts/e2e-aws-ovn-upgrade/gather-extra/artifacts/events.json | jq -r '.items[] | select(.metadata.namespace == "openshift-machine-config-operator") | .reason' | sort | uniq -c | sort -n | tail -n3
           82 Pulled
           82 Started
         2116 ValidatingAdmissionPolicyUpdated
      $ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/logs/periodic-ci-openshift-release-master-ci-4.17-e2e-aws-ovn-upgrade/1809154554084724736/artifacts/e2e-aws-ovn-upgrade/gather-extra/artifacts/events.json | jq -r '.items[] | select(.metadata.namespace == "openshift-machine-config-operator" and .reason == "ValidatingAdmissionPolicyUpdated").message' | sort | uniq -c
          705 Updated ValidatingAdmissionPolicy.admissionregistration.k8s.io/machine-configuration-guards because it changed
          705 Updated ValidatingAdmissionPolicy.admissionregistration.k8s.io/managed-bootimages-platform-check because it changed
          706 Updated ValidatingAdmissionPolicy.admissionregistration.k8s.io/mcn-guards because it changed
      

      I'm not sure what those are about (which may be a bug on it's own? Would be nice to know what changed), but it smells like a hot loop to me.

      Version-Release number of selected component

      Seen in 4.17. Not clear yet how to audit for exposure frequency or versions, short of teaching the origin test suite to fail if it sees too many of these kinds of events? Maybe a for openshift-... namespaces version of the current events should not repeat pathologically in e2e namespaces test-case? Which we may have, but it's not tripping?

      How reproducible

      Besides the initial update, also seen in this 4.17.0-0.nightly-2024-07-05-091056 serial run:

      $ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/logs/periodic-ci-openshift-release-master-nightly-4.17-e2e-aws-ovn-serial/1809154615350923264/artifacts/e2e-aws-ovn-serial/gather-extra/artifacts/events.json | jq -r '.items[] | select(.metadata.namespace == "openshift-machine-config-operator" and .reason == "ValidatingAdmissionPolicyUpdated").message' | sort | uniq -c
         1006 Updated ValidatingAdmissionPolicy.admissionregistration.k8s.io/machine-configuration-guards because it changed
         1006 Updated ValidatingAdmissionPolicy.admissionregistration.k8s.io/managed-bootimages-platform-check because it changed
         1007 Updated ValidatingAdmissionPolicy.admissionregistration.k8s.io/mcn-guards because it changed
      

      So possibly every time, in all 4.17 clusters?

      Steps to Reproduce

      1. Unclear. Possibly just install 4.17.
      2. Run oc -n openshift-machine-config-operator get -o json events | jq -r '.items[] | select(.reason == "ValidatingAdmissionPolicyUpdated")'.

      Actual results

      Thousands of hits.

      Expected results

      Zero to few hits.

              djoshy David Joshy
              trking W. Trevor King
              Sergio Regidor de la Rosa Sergio Regidor de la Rosa
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Created:
                Updated:
                Resolved: