Loading...

XML

Word

Printable

Type: Bug
Resolution: Duplicate
Priority: Critical
Fix Version/s: None
Affects Version/s: None
Component/s: None
Labels:
None

Activity Type:
Quality / Stability / Reliability
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Story Points:
None
Severity:
None

Target Version:

openshift-4.19
Release Blocker:
None
Sprint:
None

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Release Note Status:
None
Release Note Type:
None
Release Note Text:
None

Description of problem:

The CNCF tests are reliably failing but there is not yet an indicator as to why. The sonobuoy process executes and it's sub-tasks (1 per host) register completion. However the global task remains in a "running" state until prow eventually kills the job.

Actual results:

Sonobuoy output:
{
  "plugins": [
    {
      "plugin": "e2e",
      "node": "global",
      "status": "running",
      "result-status": "",
      "result-counts": null,
      "progress": {
        "name": "e2e",
        "node": "global",
        "timestamp": "2025-03-18T03:25:49.305582478Z",
        "msg": "",
        "total": 404,
        "completed": 0
      }
    },
    {
      "plugin": "systemd-logs",
      "node": "el94-src-cncf-conformance-host1",
      "status": "complete",
      "result-status": "",
      "result-counts": null
    },
    {
      "plugin": "systemd-logs",
      "node": "el94-src-cncf-conformance-host2",
      "status": "complete",
      "result-status": "",
      "result-counts": null
    }
  ],
  "status": "running",
  "tar-info": {
    "name": "",
    "created": "0001-01-01T00:00:00Z",
    "sha256": "",
    "size": 0
  }
}

Sonobuoy is run inside a pod. The logs don't indicate any errors. Below is a log line that recurs throughout the CNCF failures:

2025-03-18T01:25:45.087568043-04:00 stdout F Plugin is complete. Sleeping indefinitely to avoid container exit and automatic restarts from Kubernetes

Here is the same log line in a bit more context:

2025-03-17T23:25:47.776819804-04:00 stderr F time="2025-03-18T03:25:47Z" level=trace msg="Invoked command single-node with args [] and flags [level=trace logtostderr=true sleep=-1 v=6]"
2025-03-17T23:25:47.777429876-04:00 stderr F time="2025-03-18T03:25:47Z" level=info msg="Waiting for waitfile" waitfile=/tmp/sonobuoy/results/done
2025-03-17T23:25:47.777529041-04:00 stderr F time="2025-03-18T03:25:47Z" level=info msg="Starting to listen on port 8099 for progress updates and will relay them to https://[10.42.1.6]:8080/api/v1/progress/by-node/el94-src-cncf-conformance-host1/systemd-logs"
2025-03-17T23:25:48.777653263-04:00 stderr F time="2025-03-18T03:25:48Z" level=trace msg="Detected done file but sleeping for 5s then checking again for file. This allows other containers to intervene if desired."
2025-03-17T23:25:53.778970966-04:00 stderr F time="2025-03-18T03:25:53Z" level=info msg="Detected done file, transmitting result file" resultFile=/tmp/sonobuoy/results/systemd_logs
2025-03-17T23:25:53.810797001-04:00 stderr F time="2025-03-18T03:25:53Z" level=info msg="Results transmitted to aggregator.  Sleeping forever."

is related to

USHIFT-5588 CNCF tests are timing out (sometimes with 'Test Suite starting' msg)

Closed

Assignee:: Jon Cope

Reporter:: Jon Cope

Need Info From:: None

Contributors:: None

QA Contact:: None

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Created:: 2025/03/19 4:28 PM

Updated:: 2025/07/02 1:11 PM

Resolved:: 2025/05/12 2:11 PM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates