Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-36867

Static pod controller pods sometimes fail to start [etcd]

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Won't Do
    • Icon: Undefined Undefined
    • None
    • 4.17, 4.18
    • Etcd
    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • 5
    • Important
    • No
    • None
    • Rejected
    • ETCD Sprint 259, ETCD Sprint 260, ETCD Sprint 261, ETCD Sprint 262, ETCD Sprint 263
    • 5
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      deads reported in this thread that the static pod controller appears to sometimes deploy pods that do not show up in a reasonable timeframe, which occasionally triggers this test to fail (source job):

      [sig-node] static pods should start after being created 
      
      {  static pod lifecycle failure - static pod: "etcd" in namespace: "openshift-etcd" for revision: 7 on node: "ci-op-h9zjcc96-51425-8gcc2-master-0" didn't show up, waited: 3m0s}
      

      David suspects that this actually happens far more often than the test failures indicate, however this test should be a good resource to find affected runs.

      Test details indicates this fails up to 10% of the time on some job variants. The most common compnent affected appears to be kube-controller-manager, but apiserver and etcd are both appearing at times. Use the test details link if looking for more job runs.

      Slack thread has more details from both deads@redhat.com and tjungblu@redhat.com.

      Suspicion is that fixing this could improve install times and reliability.

        1. image-2024-08-02-12-31-04-963.png
          86 kB
          Thomas Jungblut
        2. image-2024-08-02-12-32-01-567.png
          78 kB
          Thomas Jungblut
        3. image-2024-08-02-12-38-20-669.png
          80 kB
          Thomas Jungblut
        4. image-2024-08-02-12-39-20-189.png
          81 kB
          Thomas Jungblut
        5. image-2024-08-02-12-46-28-665.png
          125 kB
          Thomas Jungblut
        6. image-2024-08-02-12-46-51-229.png
          82 kB
          Thomas Jungblut
        7. HA 4.17.png
          291 kB
          Haseeb Tariq
        8. SNO 4.17.png
          247 kB
          Haseeb Tariq
        9. HA 4.18.png
          278 kB
          Haseeb Tariq
        10. SNO 4.18.png
          298 kB
          Haseeb Tariq

              rhn-coreos-htariq Haseeb Tariq
              rhn-engineering-dgoodwin Devan Goodwin
              None
              None
              Ge Liu Ge Liu
              None
              Votes:
              0 Vote for this issue
              Watchers:
              16 Start watching this issue

                Created:
                Updated:
                Resolved: