Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-63307

Component Readiness: [Test Framework] [Pathological Events] ConfigDriftMonitorStopped and RemoveSigtermProtection

XMLWordPrintable

    • None
    • False
    • Hide

      None

      Show
      None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Component Readiness has found a potential regression in the following test:

      [Monitor:legacy-test-framework-invariants-pathological][sig-arch] events should not repeat pathologically

      Several serial-techpreview jobs have similar symptoms:

      • CAMGI shows config-operator with some weird conditions
      • The test reports events like:
        event happened 25 times, something is wrong: node/ci-op-l3l3n3qk-863c8-8ftrw-worker-a-kxq6j hmsg/8a96eaa4fd - reason/ConfigDriftMonitorStopped Config Drift Monitor stopped (21:44:31Z) result=reject 
        event happened 23 times, something is wrong: node/ci-op-l3l3n3qk-863c8-8ftrw-worker-a-kxq6j hmsg/ddacfb2151 - reason/RemoveSigtermProtection Removing SIGTERM protection (21:39:14Z) result=reject
      • There are also interesting errors in config-operator pod logs (in the test_details artifact query try queries for
        artifacts: artifacts/*e2e*/gather-extra/artifacts/pods/*-config-operator*.log
        regex match: ^E10
        

        ) that may or may not be related, for example:

        E1011 21:22:58.621516 1 base_controller.go:279] "Unhandled Error" err="ConfigOperatorController reconciliation failed: configs.operator.openshift.io \"cluster\" not found"
        E1011 21:31:20.760529 1 leaderelection.go:429] Failed to update lock optimistically: Timeout: request did not complete within requested timeout - context deadline exceeded, falling back to slow path
        E1011 21:32:20.761894 1 leaderelection.go:436] error retrieving resource lock openshift-config-operator/config-operator-lock: the server was unable to return a response in the time allotted, but may still be processing the request (get leases.coordination.k8s.io config-operator-lock)
        

      The regression linked below is one example of the problem which is triaged across multiple platforms.

      Extreme regression detected.
      Fishers Exact probability of a regression: 100.00%.
      Test pass rate dropped from 100.00% to 0.00%.

      Sample (being evaluated) Release: 4.21
      Start Time: 2025-10-10T00:00:00Z
      End Time: 2025-10-17T16:00:00Z
      Success Rate: 0.00%
      Successes: 0
      Failures: 9
      Flakes: 0
      Base (historical) Release: 4.20
      Start Time: 2025-09-17T00:00:00Z
      End Time: 2025-10-17T16:00:00Z
      Success Rate: 100.00%
      Successes: 69
      Failures: 0
      Flakes: 0

      View the test details report for additional context.

      Filed by: lmeyer@redhat.com

              qiwan233 Qi Wang
              openshift-trt OpenShift Technical Release Team
              None
              None
              None
              None
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated: