Loading...

XML

Word

Printable

Type: Bug
Resolution: Not a Bug
Priority: Normal
Fix Version/s: None
Affects Version/s: 4.16
Component/s: kube-apiserver
Labels:
None

Regression:
No
Blocked:
False
Blocked Reason:

Hide

None

Show
None

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Description of problem:

While working on etcd cert rotation, I'm occasionally tripping over the thresholds for pathological events failures ("pathological event should not see excessive RequiredInstallerResourcesMissing secrets").

Besides the ones we have to fix in cluster-etcd-operator, I've also seen plenty of them in kube-api-server-operator. Similar pattern can be seen in kube-controller-manager-operator and kube-scheduler-operator, albeit less often.

From the logs (attached) it seems to me that this is caused by a secrets informer not being synchronized before the controller actually runs. 

---

Some test runs from our payload testing:

https://prow.ci.openshift.org/view/gs/test-platform-results/logs/openshift-cluster-etcd-operator-1177-nightly-4.16-e2e-aws-sdn-upgrade/1750540617936539648

https://prow.ci.openshift.org/view/gs/test-platform-results/logs/openshift-cluster-etcd-operator-1177-nightly-4.16-e2e-aws-sdn-upgrade/1750540619433906176

https://prow.ci.openshift.org/view/gs/test-platform-results/logs/openshift-cluster-etcd-operator-1177-nightly-4.16-e2e-aws-sdn-upgrade/1750540619891085312

https://prow.ci.openshift.org/view/gs/test-platform-results/logs/openshift-cluster-etcd-operator-1177-nightly-4.16-e2e-aws-sdn-upgrade/1750540620352458752

Version-Release number of selected component (if applicable):

4.16, but I had a bug for a flaky test with this as early as 4.7, 4.8 (OCPBUGS-1128 and https://bugzilla.redhat.com/show_bug.cgi?id=2031564)

How reproducible:

2/8 runs on average, it's a race condition that is becoming more prevalent after merging https://github.com/openshift/cluster-etcd-operator/pull/1177

Steps to Reproduce:

    1. checkout https://github.com/openshift/cluster-etcd-operator/pull/1177
    2. run payload tests a few times
    3. observe failures

Actual results:

flaky/failing test runs due to RequiredInstallerResourcesMissing

Expected results:

no flaky RequiredInstallerResourcesMissing anymore :)

Additional info:

In case we don't want to tackle it, I've prepped an increase for the time: being in https://github.com/openshift/origin/pull/28557

I can also add the other components, if necessary.

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

current.log
2024/01/26 9:09 AM
355 kB
Thomas Jungblut

Assignee:: Unassigned

Reporter:: Thomas Jungblut

QA Contact:: Ke Wang

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Created:: 2024/01/26 9:14 AM

Updated:: 2024/01/30 4:37 PM

Resolved:: 2024/01/30 4:37 PM

Details

Description

Attachments

Attachments

Easy Agile Planning Poker

Activity

People

Dates