-
Bug
-
Resolution: Done-Errata
-
Major
-
4.15.0
-
None
-
No
-
Rejected
-
False
-
-
Release Note Not Required
-
In Progress
Description of problem:
Recently, the passing rate for test "static pods should start after being created" has dropped significantly for some platforms: https://sippy.dptools.openshift.org/sippy-ng/tests/4.15/analysis?test=%5Bsig-node%5D%20static%20pods%20should%20start%20after%20being%20created&filters=%7B%22items%22%3A%5B%7B%22columnField%22%3A%22name%22%2C%22operatorValue%22%3A%22equals%22%2C%22value%22%3A%22%5Bsig-node%5D%20static%20pods%20should%20start%20after%20being%20created%22%7D%5D%2C%22linkOperator%22%3A%22and%22%7D Take a look at this example: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.15-e2e-azure-sdn-techpreview/1712803313642115072 The test failed with the following message: { static pod lifecycle failure - static pod: "kube-controller-manager" in namespace: "openshift-kube-controller-manager" for revision: 6 on node: "ci-op-2z99zzqd-7f99c-rfp4q-master-0" didn't show up, waited: 3m0s} Seemingly revision 6 was never reached. But if we look at the log from kube-controller-manager-operator, it jumps from revision 5 to revision 7: https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.15-e2e-azure-sdn-techpreview/1712803313642115072/artifacts/e2e-azure-sdn-techpreview/gather-extra/artifacts/pods/openshift-kube-controller-manager-operator_kube-controller-manager-operator-7cd978d745-bcvkm_kube-controller-manager-operator.log The log also indicates that there is a possibility of race: W1013 12:59:17.775274 1 staticpod.go:38] revision 7 is unexpectedly already the latest available revision. This is a possible race! This might be a static controller issue. But I am starting with kube-controller-manager component for the case. Feel free to reassign. Here is a slack thread related to this: https://redhat-internal.slack.com/archives/C01CQA76KMX/p1697472297510279
Version-Release number of selected component (if applicable):
How reproducible:
Steps to Reproduce:
1. 2. 3.
Actual results:
Expected results:
Additional info:
- links to
-
RHEA-2024:0041 OpenShift Container Platform 4.16.z bug fix update
(5 links to)