Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-46593

kube-storage-version-migrator goes Available=False with reason=KubeStorageVersionMigrator_Deploying during updates

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Major Major
    • None
    • 4.15, 4.16, 4.17, 4.18
    • Moderate
    • No
    • Rejected
    • False
    • Hide

      None

      Show
      None
    • Release Note Not Required
    • In Progress

      This is a clone of issue OCPBUGS-45063. The following is the description of the original issue:

      This is a clone of issue OCPBUGS-20062. The following is the description of the original issue:

      Description of problem:

      Reviving rhbz#1948087, the kube-storage-version-migrator ClusterOperator occasionally goes Available=False with reason=KubeStorageVersionMigrator_Deploying. For example, this run includes:

      : [bz-kube-storage-version-migrator] clusteroperator/kube-storage-version-migrator should not change condition/Available expand_less	1h34m30s
      {  1 unexpected clusteroperator state transitions during e2e test run.  These did not match any known exceptions, so they cause this test-case to fail:
      
      Oct 03 22:09:07.933 - 33s   E clusteroperator/kube-storage-version-migrator condition/Available reason/KubeStorageVersionMigrator_Deploying status/False KubeStorageVersionMigratorAvailable: Waiting for Deployment
      

      But that is a node rebooting into newer RHCOS, and do not warrant immediate admin intervention. Teaching the KSVM operator to stay Available=True for this kind of brief hiccup, while still going Available=False for issues where least part of the component is non-functional, and that the condition requires immediate administrator intervention would make it easier for admins and SREs operating clusters to identify when intervention was required.

      Version-Release number of selected component (if applicable):

      4.8 and 4.15. Possibly all supported versions of the KSVM operator have this exposure.

      How reproducible:

      Looks like many (all?) 4.15 update jobs have near 100% reproducibility for some kind of issue with KSVM going Available=False, see Actual results below. These are likely for reasons that do not require admin intervention, although figuring that out is tricky today, feel free to push back if you feel that some of these do warrant admin immediate admin intervention.

      Steps to Reproduce:

      w3m -dump -cols 200 'https://search.ci.openshift.org/?maxAge=48h&type=junit&search=clusteroperator/kube-storage-version-migrator+should+not+change+condition/Available' | grep '^periodic-.*4[.]15.*failures match' | sort
      

      Actual results:

      periodic-ci-openshift-hypershift-release-4.15-periodics-e2e-kubevirt-conformance (all) - 2 runs, 100% failed, 50% of failures match = 50% impact
      periodic-ci-openshift-multiarch-master-nightly-4.15-ocp-e2e-aws-ovn-heterogeneous-upgrade (all) - 19 runs, 42% failed, 163% of failures match = 68% impact
      periodic-ci-openshift-multiarch-master-nightly-4.15-ocp-e2e-upgrade-aws-ovn-arm64 (all) - 18 runs, 61% failed, 118% of failures match = 72% impact
      periodic-ci-openshift-multiarch-master-nightly-4.15-upgrade-from-nightly-4.14-ocp-e2e-aws-sdn-arm64 (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
      periodic-ci-openshift-multiarch-master-nightly-4.15-upgrade-from-nightly-4.14-ocp-ovn-remote-libvirt-ppc64le (all) - 4 runs, 100% failed, 50% of failures match = 50% impact
      periodic-ci-openshift-multiarch-master-nightly-4.15-upgrade-from-stable-4.14-ocp-e2e-aws-ovn-heterogeneous-upgrade (all) - 19 runs, 47% failed, 189% of failures match = 89% impact
      periodic-ci-openshift-multiarch-master-nightly-4.15-upgrade-from-stable-4.14-ocp-e2e-aws-sdn-arm64 (all) - 9 runs, 78% failed, 86% of failures match = 67% impact
      periodic-ci-openshift-release-master-ci-4.15-e2e-aws-ovn-upgrade (all) - 11 runs, 64% failed, 114% of failures match = 73% impact
      periodic-ci-openshift-release-master-ci-4.15-e2e-azure-ovn-upgrade (all) - 65 runs, 45% failed, 169% of failures match = 75% impact
      periodic-ci-openshift-release-master-ci-4.15-e2e-azure-sdn-upgrade (all) - 6 runs, 50% failed, 133% of failures match = 67% impact
      periodic-ci-openshift-release-master-ci-4.15-e2e-gcp-ovn-upgrade (all) - 75 runs, 24% failed, 361% of failures match = 87% impact
      periodic-ci-openshift-release-master-ci-4.15-upgrade-from-stable-4.14-e2e-aws-ovn-upgrade (all) - 75 runs, 29% failed, 277% of failures match = 81% impact
      periodic-ci-openshift-release-master-ci-4.15-upgrade-from-stable-4.14-e2e-aws-sdn-upgrade (all) - 8 runs, 50% failed, 175% of failures match = 88% impact
      periodic-ci-openshift-release-master-ci-4.15-upgrade-from-stable-4.14-e2e-azure-sdn-upgrade (all) - 74 runs, 36% failed, 185% of failures match = 68% impact
      periodic-ci-openshift-release-master-ci-4.15-upgrade-from-stable-4.14-e2e-gcp-ovn-rt-upgrade (all) - 69 runs, 49% failed, 156% of failures match = 77% impact
      periodic-ci-openshift-release-master-ci-4.15-upgrade-from-stable-4.14-e2e-gcp-ovn-upgrade (all) - 7 runs, 57% failed, 175% of failures match = 100% impact
      periodic-ci-openshift-release-master-ci-4.15-upgrade-from-stable-4.14-e2e-gcp-sdn-upgrade (all) - 6 runs, 33% failed, 250% of failures match = 83% impact
      periodic-ci-openshift-release-master-nightly-4.15-e2e-aws-ovn-single-node-serial (all) - 6 runs, 100% failed, 17% of failures match = 17% impact
      periodic-ci-openshift-release-master-nightly-4.15-e2e-aws-sdn-upgrade (all) - 60 runs, 38% failed, 187% of failures match = 72% impact
      periodic-ci-openshift-release-master-nightly-4.15-e2e-gcp-sdn-upgrade (all) - 6 runs, 33% failed, 200% of failures match = 67% impact
      periodic-ci-openshift-release-master-nightly-4.15-e2e-metal-ipi-sdn-bm-upgrade (all) - 7 runs, 29% failed, 300% of failures match = 86% impact
      periodic-ci-openshift-release-master-nightly-4.15-e2e-metal-ipi-upgrade-ovn-ipv6 (all) - 7 runs, 71% failed, 80% of failures match = 57% impact
      periodic-ci-openshift-release-master-nightly-4.15-upgrade-from-stable-4.14-e2e-aws-sdn-upgrade (all) - 6 runs, 50% failed, 200% of failures match = 100% impact
      periodic-ci-openshift-release-master-nightly-4.15-upgrade-from-stable-4.14-e2e-metal-ipi-sdn-bm-upgrade (all) - 6 runs, 100% failed, 83% of failures match = 83% impact
      periodic-ci-openshift-release-master-nightly-4.15-upgrade-from-stable-4.14-e2e-metal-ipi-upgrade-ovn-ipv6 (all) - 6 runs, 100% failed, 83% of failures match = 83% impact
      periodic-ci-openshift-release-master-okd-4.15-e2e-aws-ovn-upgrade (all) - 13 runs, 54% failed, 71% of failures match = 38% impact
      periodic-ci-openshift-release-master-okd-scos-4.15-e2e-aws-ovn-upgrade (all) - 16 runs, 63% failed, 70% of failures match = 44% impact
      

      Expected results:

      KSVM goes Available=False if and only if immediate admin intervention is appropriate.

              lusanche@redhat.com Luis Sanchez
              openshift-crt-jira-prow OpenShift Prow Bot
              Rahul Gangwar Rahul Gangwar
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated: