-
Epic
-
Resolution: Done
-
Major
-
None
-
OKD Green CI
-
False
-
-
False
-
Not Selected
-
To Do
-
QE Needed, Docs Needed, TE Needed, Customer Facing, PX Needed
-
50% To Do, 0% In Progress, 50% Done
Inspired partly by this Slack conversation https://redhat-internal.slack.com/archives/C02CZNQHGN8/p1677063967764049, I've observed that OKD CI Prow jobs tend to not be green as often as their OCP counterparts. I suspect two reasons for this:
- The components aren't specifically targeting OKD in their testing. This is fine because not all components have an OKD-specific code path.
- Many of the tests are optional and do not gate merging of PRs. In other words, they fail but a failing OKD Prow job is not a requirement to merge code for many teams.
Indeed, looking at the CI configs in openshift/release, I noticed that only 17 OpenShift components have at least one OKD variant file within their CI configs:
$ for file in $(find ./ci-operator/config/openshift -type f -name "*[Oo][Kk][Dd]*"); do dirname "$file"; done | sort | uniq ./ci-operator/config/openshift/cluster-samples-operator ./ci-operator/config/openshift/cluster-update-keys ./ci-operator/config/openshift/community.okd ./ci-operator/config/openshift/installer ./ci-operator/config/openshift/ironic-agent-image ./ci-operator/config/openshift/ironic-image ./ci-operator/config/openshift/ironic-ipa-downloader ./ci-operator/config/openshift/ironic-rhcos-downloader ./ci-operator/config/openshift/ironic-static-ip-manager ./ci-operator/config/openshift/machine-config-operator ./ci-operator/config/openshift/machine-os-images ./ci-operator/config/openshift/okd-machine-os ./ci-operator/config/openshift/origin ./ci-operator/config/openshift/ovn-kubernetes ./ci-operator/config/openshift/release ./ci-operator/config/openshift/windows-machine-config-bootstrapper ./ci-operator/config/openshift/windows-machine-config-operator
And of those components, very few (if any) have passing OKD tests gating PR merges; they're usually marked optional. Despite that, it appears the jobs are being run regularly, even if the overall signal around them is being ignored. To that end, it might make sense to do a TRT-like effort to better understand what OKD tests are failing regularly, identify which component(s) are in this critical path, and add additional CI configs in the appropriate place. Overall, this would be a win for OKD as we'd have better signal and confidence around quality, which could help increase adoption and grow the community surrounding it.