-
Bug
-
Resolution: Done
-
Critical
-
None
-
4.16
-
Quality / Stability / Reliability
-
False
-
-
2
-
Moderate
-
None
-
None
-
None
-
OCPEDGE Sprint 253
-
1
-
None
-
None
-
None
-
None
-
None
-
None
-
None
Description of problem
CI jobs fail because of the following error:
Updating install-config.yaml to a single m5d.2xlarge control plane node and 0 workers /bin/bash: line 19: pip3: command not found ...
This particular failure comes from https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/openshift_cluster-ingress-operator/1052/pull-ci-openshift-cluster-ingress-operator-master-e2e-aws-ovn-single-node/1785472707073150976. Search.ci has other similar failures. Most of the failures are for single-node CI jobs, but there at least one additional CI job that is failing: https://prow.ci.openshift.org/view/gs/test-platform-results/logs/periodic-ci-openshift-release-master-ci-4.16-upgrade-from-stable-4.15-e2e-aws-sdn-upgrade-workload/1785038835940331520.
Version-Release number of selected component (if applicable)
I have seen this in CI jobs for main branches of multiple repositories and in the aforementioned 4.15-to-4.16 upgrade job. The earliest failure that I see is the periodic-ci-openshift-release-master-ci-4.16-upgrade-from-stable-4.15-e2e-aws-sdn-upgrade-workload from 41 hours ago (2024-04-29 20:10:02+0000).
How reproducible
Presently, search.ci shows the following stats for the past two days:
pull-ci-openshift-origin-master-e2e-aws-ovn-single-node-upgrade (all) - 35 runs, 77% failed, 30% of failures match = 23% impact pull-ci-openshift-origin-master-e2e-aws-ovn-single-node (all) - 33 runs, 55% failed, 44% of failures match = 24% impact pull-ci-openshift-origin-master-e2e-aws-ovn-single-node-serial (all) - 34 runs, 65% failed, 36% of failures match = 24% impact pull-ci-kubevirt-ssp-operator-main-e2e-single-node-functests (all) - 4 runs, 50% failed, 50% of failures match = 25% impact pull-ci-openshift-machine-config-operator-master-e2e-gcp-op-single-node (all) - 18 runs, 39% failed, 86% of failures match = 33% impact pull-ci-openshift-cluster-ingress-operator-master-e2e-aws-ovn-single-node (all) - 6 runs, 100% failed, 100% of failures match = 100% impact rehearse-51544-pull-ci-openshift-cluster-authentication-operator-release-4.16-e2e-aws-single-node (all) - 1 runs, 100% failed, 100% of failures match = 100% impact rehearse-51544-pull-ci-openshift-cluster-authentication-operator-release-4.17-e2e-aws-single-node (all) - 1 runs, 100% failed, 100% of failures match = 100% impact rehearse-51544-pull-ci-openshift-cluster-authentication-operator-master-e2e-aws-single-node (all) - 1 runs, 100% failed, 100% of failures match = 100% impact pull-ci-openshift-cluster-authentication-operator-master-e2e-aws-single-node (all) - 3 runs, 67% failed, 50% of failures match = 33% impact pull-ci-openshift-cluster-etcd-operator-master-e2e-aws-ovn-single-node (all) - 2 runs, 100% failed, 100% of failures match = 100% impact periodic-ci-openshift-release-master-ci-4.16-upgrade-from-stable-4.15-e2e-aws-sdn-upgrade-workload (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
Steps to Reproduce
1. Post a PR to a repository with a single-node CI job.
2. Check search.ci: https://search.dptools.openshift.org/?search=pip3%3A+command+not+found&maxAge=48h&type=junit&groupBy=job
Actual results
CI fails.
Expected results
CI passes, or fails on some other test failure.
Additional info
The Updating install-config.yaml text appears to come from https://github.com/openshift/release/blob/5d51a2b17b4fa8b3c210974786d1b456bdec2c33/ci-operator/step-registry/single-node/conf/aws/single-node-conf-aws-commands.sh#L13, so I believe the fault lies in the SNO CI step.