Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-48334

crio 1.32 fails to start because CNI is not ready yet and systemd watchdog kill the crio

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done-Errata
    • Icon: Critical Critical
    • 4.19.0
    • 4.19
    • Node / CRI-O
    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • None
    • None
    • Yes
    • None
    • Approved
    • None
    • Done
    • Release Note Not Required
    • N/A
    • None
    • None
    • None
    • None

      Description of problem:

          MicroShift fails to start because kubelet cannot talk to crio over a socket because crio is SIGABRT'ed by systemd.
      
      CRIO complains for a minute that it "Failed to get network for name: ovn-kubernetes". Then following is logged:
      
      msg="Will not notify watchdog because CRI-O is unhealthy: health checker failed: runtime status \"NetworkReady\" is invalid: Network plugin returns error: no CNI configu
      ration file in /etc/cni/net.d/. Has your network provider started? (reason: NetworkPluginNotReady)" file="watchdog/watchdog.go:64"
      
      systemd[1]: crio.service: Watchdog timeout (limit 1min)!
      systemd[1]: crio.service: Killing process 22355 (crio) with signal SIGABRT.

      Version-Release number of selected component (if applicable):

      crio 1.32

      How reproducible:

      Everytime with 1.32 in our CI

      Steps to Reproduce:

          1. Install MicroShift, install crio 1.32 (from the ocp mirror).
          2. `systemctl start microshift`
          3.
          

      Actual results:

          Starting microshift fails (it eventually starts when everything catches up to, but first start is important in CI).

      Expected results:

          crio isn't killed because the CNI is not ready yet. MicroShift starts because kubelet can contact crio.

      Additional info:

          Example journalctl (search for SIGABRT): https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/logs/periodic-ci-openshift-microshift-release-4.19-periodics-e2e-aws-tests-bootc-arm-nightly/1878623192738697216/artifacts/e2e-aws-tests-bootc-arm-nightly/openshift-microshift-e2e-metal-tests/artifacts/scenario-info/el95-src@isolated-net/vms/host1/sos/journal_2025-01-13_02:58:46.log
      
      Here's crio trace log from another machine:
      https://drive.google.com/file/d/1sQHFzV_cJOPMBkbItBIocusPtWKSyjp1/view?usp=sharing

              pehunt@redhat.com Peter Hunt
              pmatusza@redhat.com Patryk Matuszak
              None
              None
              Aditi Sahay Aditi Sahay
              None
              Votes:
              0 Vote for this issue
              Watchers:
              11 Start watching this issue

                Created:
                Updated:
                Resolved: