Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-21846

Test "static pods should start after being created" failed

    XMLWordPrintable

Details

    • Bug
    • Resolution: Unresolved
    • Major
    • 4.16.0
    • 4.15.0
    • None
    • No
    • Rejected
    • False
    • Hide

      None

      Show
      None
    • Release Note Not Required
    • In Progress

    Description

      Description of problem:

      Recently, the passing rate for test "static pods should start after being created" has dropped significantly for some platforms: 
      
      https://sippy.dptools.openshift.org/sippy-ng/tests/4.15/analysis?test=%5Bsig-node%5D%20static%20pods%20should%20start%20after%20being%20created&filters=%7B%22items%22%3A%5B%7B%22columnField%22%3A%22name%22%2C%22operatorValue%22%3A%22equals%22%2C%22value%22%3A%22%5Bsig-node%5D%20static%20pods%20should%20start%20after%20being%20created%22%7D%5D%2C%22linkOperator%22%3A%22and%22%7D
      
      Take a look at this example: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.15-e2e-azure-sdn-techpreview/1712803313642115072
      
      The test failed with the following message:
      {  static pod lifecycle failure - static pod: "kube-controller-manager" in namespace: "openshift-kube-controller-manager" for revision: 6 on node: "ci-op-2z99zzqd-7f99c-rfp4q-master-0" didn't show up, waited: 3m0s}
      
      Seemingly revision 6 was never reached. But if we look at the log from kube-controller-manager-operator, it jumps from revision 5 to revision 7: https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.15-e2e-azure-sdn-techpreview/1712803313642115072/artifacts/e2e-azure-sdn-techpreview/gather-extra/artifacts/pods/openshift-kube-controller-manager-operator_kube-controller-manager-operator-7cd978d745-bcvkm_kube-controller-manager-operator.log
      
      The log also indicates that there is a possibility of race:
      
      W1013 12:59:17.775274       1 staticpod.go:38] revision 7 is unexpectedly already the latest available revision. This is a possible race!
      
      This might be a static controller issue. But I am starting with kube-controller-manager component for the case. Feel free to reassign. 
      
      Here is a slack thread related to this:
      https://redhat-internal.slack.com/archives/C01CQA76KMX/p1697472297510279
      
      

      Version-Release number of selected component (if applicable):

       

      How reproducible:

       

      Steps to Reproduce:

      1.
      2.
      3.
      

      Actual results:

       

      Expected results:

       

      Additional info:

       

      Attachments

        Activity

          People

            jchaloup@redhat.com Jan Chaloupka
            kenzhang@redhat.com Ken Zhang
            ying zhou ying zhou
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

              Created:
              Updated: