Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-35331

Daemon set test failing occassionally [4.15]

XMLWordPrintable

    • No
    • Rejected
    • False
    • Hide

      None

      Show
      None

      This is a clone of issue OCPBUGS-32934. The following is the description of the original issue:

      Component Readiness has found a potential regression in [sig-apps] Daemon set [Serial] should surge pods onto nodes when spec was updated and update strategy is RollingUpdate [Suite:openshift/conformance/serial] [Suite:k8s].

      Probability of significant regression: 98.27%

      Sample (being evaluated) Release: 4.16
      Start Time: 2024-04-18T00:00:00Z
      End Time: 2024-04-24T23:59:59Z
      Success Rate: 90.00%
      Successes: 27
      Failures: 3
      Flakes: 0

      Base (historical) Release: 4.15
      Start Time: 2024-02-01T00:00:00Z
      End Time: 2024-02-28T23:59:59Z
      Success Rate: 100.00%
      Successes: 83
      Failures: 0
      Flakes: 0

      View the test details report at https://sippy.dptools.openshift.org/sippy-ng/component_readiness/test_details?arch=amd64&baseEndTime=2024-02-28%2023%3A59%3A59&baseRelease=4.15&baseStartTime=2024-02-01%2000%3A00%3A00&capability=Other&component=openshift-controller-manager%20%2F%20apps&confidence=95&environment=sdn%20no-upgrade%20amd64%20aws%20serial&excludeArches=arm64%2Cheterogeneous%2Cppc64le%2Cs390x&excludeClouds=openstack%2Cibmcloud%2Clibvirt%2Covirt%2Cunknown&excludeVariants=hypershift%2Cosd%2Cmicroshift%2Ctechpreview%2Csingle-node%2Cassisted%2Ccompact&groupBy=cloud%2Carch%2Cnetwork&ignoreDisruption=true&ignoreMissing=false&minFail=3&network=sdn&pity=5&platform=aws&sampleEndTime=2024-04-24%2023%3A59%3A59&sampleRelease=4.16&sampleStartTime=2024-04-18%2000%3A00%3A00&testId=openshift-tests%3Aee8012d039dcb8b357bb3ddb513b54dd&testName=%5Bsig-apps%5D%20Daemon%20set%20%5BSerial%5D%20should%20surge%20pods%20onto%20nodes%20when%20spec%20was%20updated%20and%20update%20strategy%20is%20RollingUpdate%20%5BSuite%3Aopenshift%2Fconformance%2Fserial%5D%20%5BSuite%3Ak8s%5D&upgrade=no-upgrade&variant=serial

       

      Looking at this job as an example, test failed with this error message:

       

       invariants were violated during daemonset update: An old pod with UID f44d840a-4430-4666-addd-cc3fae7a1e8a has been running alongside a newer version for longer than 1m0s { s: "invariants were violated during daemonset update:\nAn old pod with UID f44d840a-4430-4666-addd-cc3fae7a1e8a has been running alongside a newer version for longer than 1m0s", }

       

      The last query from log blob:https://prow.ci.openshift.org/bbb02cd3-e004-48be-aef8-8ce6ef7acc47 shows the conflicting pods at 8:46:47.812:

       
      {{ Apr 23 18:46:47.812: INFO: Node Version Name UID Deleted Ready
      Apr 23 18:46:47.812: INFO: ip-10-0-22-60.us-west-1.compute.internal 1 daemon-set-42bhb f44d840a-4430-4666-addd-cc3fae7a1e8a false true
      Apr 23 18:46:47.812: INFO: ip-10-0-22-60.us-west-1.compute.internal 2 daemon-set-c5p92 02f9a3ba-cc0c-4954-bc29-65b4b954b93b false true
      Apr 23 18:46:47.812: INFO: ip-10-0-28-200.us-west-1.compute.internal 1 daemon-set-7p7hs 4ba591c7-623a-4397-b99d-3b2616b5a787 false true
      Apr 23 18:46:47.812: INFO: ip-10-0-95-178.us-west-1.compute.internal 2 daemon-set-chhhl 53d07493-c0f4-46f1-9365-f67be6ac993b false true
      }}

       

      Yet if you look at journal from ip-10-0-22-60.us-west-1.compute.internal, it starts deleting pod daemon-set-42bhb at 18:46:48.140434

       

      Apr 23 18:46:48.140434 ip-10-0-22-60 kubenswrapper[1432]: I0423 18:46:48.140408    1432 kubelet.go:2445] "SyncLoop DELETE" source="api" pods=["e2e-daemonsets-9316/daemon-set-42bhb"]

       

      Should the test be waiting longer, or there is a legit problem with the delay?

              fkrepins@redhat.com Filip Krepinsky
              openshift-crt-jira-prow OpenShift Prow Bot
              Rama Kasturi Narra Rama Kasturi Narra
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated:
                Resolved: