Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-9249

[bz-Image Registry] clusteroperator/image-registry should not change condition/Available

    • Quality / Stability / Reliability
    • None
    • None
    • None
    • None
    • None
    • Unspecified
    • None
    • None
    • None
    • None
    • None
    • If docs needed, set a value
    • None
    • None
    • None
    • None
    • None

      [bz-Image Registry] clusteroperator/image-registry should not change condition/Available

      is failing frequently in CI, see [1] and:

      $ w3m -dump -cols 200 'https://search.ci.openshift.org/?maxAge=24h&type=junit&search=image-registry+should+not+change+condition/Available' | grep 'failures match' | sort
      periodic-ci-openshift-multiarch-master-nightly-4.10-upgrade-from-nightly-4.9-ocp-remote-libvirt-ppc64le (all) - 2 runs, 100% failed, 100% of failures match = 100% impact
      periodic-ci-openshift-multiarch-master-nightly-4.10-upgrade-from-nightly-4.9-ocp-remote-libvirt-s390x (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
      periodic-ci-openshift-multiarch-master-nightly-4.11-ocp-e2e-aws-arm64-techpreview-serial (all) - 6 runs, 67% failed, 25% of failures match = 17% impact
      periodic-ci-openshift-multiarch-master-nightly-4.11-upgrade-from-nightly-4.10-ocp-remote-libvirt-ppc64le (all) - 2 runs, 50% failed, 100% of failures match = 50% impact
      periodic-ci-openshift-multiarch-master-nightly-4.8-upgrade-from-nightly-4.7-ocp-remote-libvirt-s390x (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
      periodic-ci-openshift-multiarch-master-nightly-4.9-upgrade-from-nightly-4.8-ocp-remote-libvirt-s390x (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
      periodic-ci-openshift-release-master-ci-4.10-upgrade-from-stable-4.9-e2e-ovirt-upgrade (all) - 4 runs, 50% failed, 200% of failures match = 100% impact
      periodic-ci-openshift-release-master-ci-4.10-upgrade-from-stable-4.9-e2e-vsphere-upgrade (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
      periodic-ci-openshift-release-master-ci-4.11-e2e-aws-upgrade-single-node (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
      periodic-ci-openshift-release-master-ci-4.11-e2e-azure-upgrade-single-node (all) - 2 runs, 100% failed, 50% of failures match = 50% impact
      periodic-ci-openshift-release-master-ci-4.11-e2e-gcp-ovn (all) - 2 runs, 50% failed, 100% of failures match = 50% impact
      periodic-ci-openshift-release-master-ci-4.11-e2e-gcp-techpreview-serial (all) - 2 runs, 100% failed, 100% of failures match = 100% impact
      periodic-ci-openshift-release-master-ci-4.11-e2e-gcp-upgrade (all) - 60 runs, 25% failed, 53% of failures match = 13% impact
      periodic-ci-openshift-release-master-ci-4.11-upgrade-from-stable-4.10-e2e-gcp-ovn-rt-upgrade (all) - 4 runs, 100% failed, 25% of failures match = 25% impact
      periodic-ci-openshift-release-master-ci-4.11-upgrade-from-stable-4.10-e2e-gcp-ovn-upgrade (all) - 21 runs, 100% failed, 43% of failures match = 43% impact
      periodic-ci-openshift-release-master-ci-4.11-upgrade-from-stable-4.10-e2e-ovirt-upgrade (all) - 4 runs, 75% failed, 133% of failures match = 100% impact
      periodic-ci-openshift-release-master-ci-4.8-e2e-aws-upgrade-single-node (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
      periodic-ci-openshift-release-master-ci-4.8-e2e-azure-upgrade-single-node (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
      periodic-ci-openshift-release-master-ci-4.8-upgrade-from-stable-4.7-e2e-ovirt-upgrade (all) - 4 runs, 100% failed, 100% of failures match = 100% impact
      periodic-ci-openshift-release-master-ci-4.8-upgrade-from-stable-4.7-e2e-vsphere-upgrade (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
      periodic-ci-openshift-release-master-ci-4.9-upgrade-from-stable-4.8-e2e-ovirt-upgrade (all) - 4 runs, 75% failed, 100% of failures match = 75% impact
      periodic-ci-openshift-release-master-ci-4.9-upgrade-from-stable-4.8-e2e-vsphere-upgrade (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
      periodic-ci-openshift-release-master-nightly-4.10-e2e-vsphere-upi-serial (all) - 3 runs, 33% failed, 100% of failures match = 33% impact
      periodic-ci-openshift-release-master-nightly-4.10-upgrade-from-stable-4.9-e2e-metal-ipi-upgrade-ovn-ipv6 (all) - 2 runs, 100% failed, 50% of failures match = 50% impact
      periodic-ci-openshift-release-master-nightly-4.11-e2e-gcp (all) - 2 runs, 100% failed, 50% of failures match = 50% impact
      periodic-ci-openshift-release-master-nightly-4.11-e2e-gcp-rt (all) - 2 runs, 50% failed, 100% of failures match = 50% impact
      periodic-ci-openshift-release-master-nightly-4.11-e2e-metal-ipi-serial-ovn-dualstack (all) - 2 runs, 50% failed, 100% of failures match = 50% impact
      periodic-ci-openshift-release-master-nightly-4.11-e2e-metal-ipi-upgrade (all) - 3 runs, 100% failed, 100% of failures match = 100% impact
      periodic-ci-openshift-release-master-nightly-4.11-e2e-metal-ipi-upgrade-ovn-ipv6 (all) - 3 runs, 100% failed, 33% of failures match = 33% impact
      periodic-ci-openshift-release-master-nightly-4.11-upgrade-from-stable-4.10-e2e-metal-ipi-upgrade-ovn-ipv6 (all) - 3 runs, 100% failed, 33% of failures match = 33% impact
      periodic-ci-openshift-release-master-nightly-4.9-e2e-vsphere-upi-serial (all) - 2 runs, 50% failed, 100% of failures match = 50% impact
      periodic-ci-openshift-release-master-okd-4.10-e2e-vsphere (all) - 5 runs, 80% failed, 25% of failures match = 20% impact
      pull-ci-openshift-machine-config-operator-release-4.10-e2e-aws-upgrade-single-node (all) - 2 runs, 100% failed, 100% of failures match = 100% impact
      pull-ci-openshift-machine-config-operator-release-4.10-e2e-vsphere-upgrade (all) - 2 runs, 50% failed, 100% of failures match = 50% impact
      pull-ci-openshift-origin-master-e2e-aws-single-node-upgrade (all) - 9 runs, 78% failed, 100% of failures match = 78% impact

      For example, [2] has:

      : [bz-Image Registry] clusteroperator/image-registry should not change condition/Available
      Run #0: Failed 2h23m7s

      { 2 unexpected clusteroperator state transitions during e2e test run May 03 14:28:58.817 - 3490s E clusteroperator/image-registry condition/Available status/False reason/Available: The deployment does not have available replicas\nNodeCADaemonAvailable: The daemon set node-ca has available replicas\nImagePrunerAvailable: Pruner CronJob has been created 2 tests failed during this blip (2022-05-03 14:28:58.817414102 +0000 UTC to 2022-05-03 14:28:58.817414102 +0000 UTC): [sig-apps] StatefulSet Basic StatefulSet functionality [StatefulSetBasic] should provide basic identity [Suite:openshift/conformance/parallel] [Suite:k8s] [sig-apps] StatefulSet Basic StatefulSet functionality [StatefulSetBasic] should adopt matching orphans and release non-matching pods [Suite:openshift/conformance/parallel] [Suite:k8s]}

      With:

      $ curl -s https://storage.googleapis.com/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.11-e2e-gcp-upgrade/1521470801340010496/build-log.txt | grep 'clusteroperator/image-registry condition/Available.*changed'
      May 03 14:28:58.817 E clusteroperator/image-registry condition/Available status/False reason/NoReplicasAvailable changed: Available: The deployment does not have available replicas\nNodeCADaemonAvailable: The daemon set node-ca has available replicas\nImagePrunerAvailable: Pruner CronJob has been created
      May 03 15:27:08.975 W clusteroperator/image-registry condition/Available status/True reason/MinimumAvailability changed: Available: The registry has minimum availability\nNodeCADaemonAvailable: The daemon set node-ca has available replicas\nImagePrunerAvailable: Pruner CronJob has been created

      The test-case is flake-only, so this isn't impacting CI success rates. But having the operator claim Available=False is not a great customer experience. Possibly not a big enough UX impact to be worth backports, but certainly a big enough UX impact to be worth fixing in the development branch.

      [1]: https://sippy.ci.openshift.org/sippy-ng/tests/4.11/analysis?test=%5Bbz-Image%20Registry%5D%20clusteroperator%2Fimage-registry%20should%20not%20change%20condition%2FAvailable
      [2]: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.11-e2e-gcp-upgrade/1521470801340010496

              Unassigned Unassigned
              trking W. Trevor King
              None
              None
              Jianping Shu Jianping Shu
              None
              Red Hat Employee
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Created:
                Updated:
                Resolved: