Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-55217

Optimistically update Kube Server and Client CA bundles

XMLWordPrintable

    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • None
    • None
    • None
    • None
    • Approved
    • None
    • Done
    • Bug Fix
    • Hide
      Cause: The same CA bundle in the configmap is being updated by two identical copies of the same controller.
      Consequence: these controllers may receive varying metadata inputs and rewrite each other's modifications, resulting in the creation of multiple duplicating events.
      Fix: controllers use optimistic update and server-side apply to avoid update events and handle update conflicts. The controller's sidecar copy does not update the configmap unless its contents expire.
      Result: during update which includes metadata changes duplicate events are no longer emitted, expected metadata is being set.
      Show
      Cause: The same CA bundle in the configmap is being updated by two identical copies of the same controller. Consequence: these controllers may receive varying metadata inputs and rewrite each other's modifications, resulting in the creation of multiple duplicating events. Fix: controllers use optimistic update and server-side apply to avoid update events and handle update conflicts. The controller's sidecar copy does not update the configmap unless its contents expire. Result: during update which includes metadata changes duplicate events are no longer emitted, expected metadata is being set.
    • None
    • None
    • None
    • None

      Component Readiness has found a potential regression in the following test:

      [sig-arch] events should not repeat pathologically

      Significant regression detected.
      Fishers Exact probability of a regression: 100.00%.
      Test pass rate dropped from 100.00% to 92.74%.

      Sample (being evaluated) Release: 4.20
      Start Time: 2025-08-06T00:00:00Z
      End Time: 2025-08-13T16:00:00Z
      Success Rate: 92.74%
      Successes: 281
      Failures: 22
      Flakes: 0
      Base (historical) Release: 4.19
      Start Time: 2025-05-18T00:00:00Z
      End Time: 2025-06-17T23:59:59Z
      Success Rate: 100.00%
      Successes: 981
      Failures: 0
      Flakes: 0

      View the test details report for additional context.

      The first 5 failures I looked at all look to be the same cause:

      [sig-arch] events should not repeat pathologically expand_less 	0s
      {  1 events happened too frequently
      
      event happened 115 times, something is wrong: node/ip-10-0-117-116.ec2.internal hmsg/a4cafcb105 - reason/ConfigMapUpdateFailed Failed to update ConfigMap/csr-controller-ca -n openshift-config-managed: Operation cannot be fulfilled on configmaps "csr-controller-ca": the object has been modified; please apply your changes to the latest version and try again (12:43:31Z) result=reject }
      

      This test is intended to catch excessive event spamming and load on etcd, the limit is 20 identical events, and here we're seeing hundreds. This is caused by different copies of cert rotation controller rewriting each other changes.

      Filed by: dgoodwin@redhat.com

              vrutkovs@redhat.com Vadim Rutkovsky
              vrutkovs@redhat.com Vadim Rutkovsky
              None
              None
              Rahul Gangwar Rahul Gangwar
              None
              Votes:
              0 Vote for this issue
              Watchers:
              14 Start watching this issue

                Created:
                Updated: