Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-28243

RequiredInstallerResourcesMissing pathological events

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Not a Bug
    • Icon: Normal Normal
    • None
    • 4.16
    • kube-apiserver
    • None
    • No
    • False
    • Hide

      None

      Show
      None

      Description of problem:

      While working on etcd cert rotation, I'm occasionally tripping over the thresholds for pathological events failures ("pathological event should not see excessive RequiredInstallerResourcesMissing secrets").
      
      Besides the ones we have to fix in cluster-etcd-operator, I've also seen plenty of them in kube-api-server-operator. Similar pattern can be seen in kube-controller-manager-operator and kube-scheduler-operator, albeit less often.
      
      From the logs (attached) it seems to me that this is caused by a secrets informer not being synchronized before the controller actually runs. 
      
      ---
      
      Some test runs from our payload testing:
      
      https://prow.ci.openshift.org/view/gs/test-platform-results/logs/openshift-cluster-etcd-operator-1177-nightly-4.16-e2e-aws-sdn-upgrade/1750540617936539648
      
      https://prow.ci.openshift.org/view/gs/test-platform-results/logs/openshift-cluster-etcd-operator-1177-nightly-4.16-e2e-aws-sdn-upgrade/1750540619433906176
      
      https://prow.ci.openshift.org/view/gs/test-platform-results/logs/openshift-cluster-etcd-operator-1177-nightly-4.16-e2e-aws-sdn-upgrade/1750540619891085312
      
      https://prow.ci.openshift.org/view/gs/test-platform-results/logs/openshift-cluster-etcd-operator-1177-nightly-4.16-e2e-aws-sdn-upgrade/1750540620352458752

      Version-Release number of selected component (if applicable):

      4.16, but I had a bug for a flaky test with this as early as 4.7, 4.8 (OCPBUGS-1128 and https://bugzilla.redhat.com/show_bug.cgi?id=2031564)    

      How reproducible:

      2/8 runs on average, it's a race condition that is becoming more prevalent after merging https://github.com/openshift/cluster-etcd-operator/pull/1177

      Steps to Reproduce:

          1. checkout https://github.com/openshift/cluster-etcd-operator/pull/1177
          2. run payload tests a few times
          3. observe failures
         

      Actual results:

      flaky/failing test runs due to RequiredInstallerResourcesMissing

      Expected results:

      no flaky RequiredInstallerResourcesMissing anymore :)     

      Additional info:

      In case we don't want to tackle it, I've prepped an increase for the time: being in https://github.com/openshift/origin/pull/28557
      
      I can also add the other components, if necessary.

       

              Unassigned Unassigned
              tjungblu@redhat.com Thomas Jungblut
              Ke Wang Ke Wang
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated:
                Resolved: