Uploaded image for project: 'MicroShift'
  1. MicroShift
  2. USHIFT-5524

CNCF tests consistently timeout waiting for sonobuoy

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Duplicate
    • Icon: Critical Critical
    • None
    • None
    • None
    • None
    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • None
    • None
    • None
    • None
    • None

      Description of problem:

      The CNCF tests are reliably failing but there is not yet an indicator as to why. The sonobuoy process executes and it's sub-tasks (1 per host) register completion. However the global task remains in a "running" state until prow eventually kills the job.

      Actual results:

      Sonobuoy output:
      {
        "plugins": [
          {
            "plugin": "e2e",
            "node": "global",
            "status": "running",
            "result-status": "",
            "result-counts": null,
            "progress": {
              "name": "e2e",
              "node": "global",
              "timestamp": "2025-03-18T03:25:49.305582478Z",
              "msg": "",
              "total": 404,
              "completed": 0
            }
          },
          {
            "plugin": "systemd-logs",
            "node": "el94-src-cncf-conformance-host1",
            "status": "complete",
            "result-status": "",
            "result-counts": null
          },
          {
            "plugin": "systemd-logs",
            "node": "el94-src-cncf-conformance-host2",
            "status": "complete",
            "result-status": "",
            "result-counts": null
          }
        ],
        "status": "running",
        "tar-info": {
          "name": "",
          "created": "0001-01-01T00:00:00Z",
          "sha256": "",
          "size": 0
        }
      }

      Sonobuoy is run inside a pod. The logs don't indicate any errors. Below is a log line that recurs throughout the CNCF failures:

       

      2025-03-18T01:25:45.087568043-04:00 stdout F Plugin is complete. Sleeping indefinitely to avoid container exit and automatic restarts from Kubernetes

       

      Here is the same log line in a bit more context:

      2025-03-17T23:25:47.776819804-04:00 stderr F time="2025-03-18T03:25:47Z" level=trace msg="Invoked command single-node with args [] and flags [level=trace logtostderr=true sleep=-1 v=6]"
      2025-03-17T23:25:47.777429876-04:00 stderr F time="2025-03-18T03:25:47Z" level=info msg="Waiting for waitfile" waitfile=/tmp/sonobuoy/results/done
      2025-03-17T23:25:47.777529041-04:00 stderr F time="2025-03-18T03:25:47Z" level=info msg="Starting to listen on port 8099 for progress updates and will relay them to https://[10.42.1.6]:8080/api/v1/progress/by-node/el94-src-cncf-conformance-host1/systemd-logs"
      2025-03-17T23:25:48.777653263-04:00 stderr F time="2025-03-18T03:25:48Z" level=trace msg="Detected done file but sleeping for 5s then checking again for file. This allows other containers to intervene if desired."
      2025-03-17T23:25:53.778970966-04:00 stderr F time="2025-03-18T03:25:53Z" level=info msg="Detected done file, transmitting result file" resultFile=/tmp/sonobuoy/results/systemd_logs
      2025-03-17T23:25:53.810797001-04:00 stderr F time="2025-03-18T03:25:53Z" level=info msg="Results transmitted to aggregator.  Sleeping forever."

              jcope@redhat.com Jon Cope
              jcope@redhat.com Jon Cope
              None
              None
              None
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated:
                Resolved: