Uploaded image for project: 'OpenShift Windows Containers'
  1. OpenShift Windows Containers
  2. WINC-1604

openshift/release: Add CSV Version Verification to cucushift-winc-upgrade step

XMLWordPrintable

    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • False
    • 5
    • None
    • None
    • None

      Problem Statement

      WMCO upgrade testing on Jenkins (https://jenkins-csb-openshift-qe.apps.ocp-c1.prod.psi.redhat.com/job/winc/job/winc-upgrade/) has proper CSV version verification, but the official OpenShift Prow CI does not.

      The cucushift-winc-upgrade step in openshift/release repository (used by Prow periodic jobs) only checks health status, not CSV version change:

      # Current verification (ci-operator/step-registry/cucushift/winc/upgrade/cucushift-winc-upgrade-commands.sh)
      oc wait csv --all --for=jsonpath='{.status.phase}'=Succeeded
      oc wait deployment windows-machine-config-operator --for condition=Available=True
      oc wait nodes -l kubernetes.io/os=windows --for condition=Ready=True
      

      This creates a gap where:
      - Jenkins testing (non-official) catches upgrade failures ✅
      - Prow CI testing (official) misses upgrade failures ❌

      Risk

      Prow periodic jobs could pass even if WMCO operator didn't upgrade:
      - Old CSV (10.20) still running in Succeeded state
      - Operator never upgraded to new version (10.21)
      - Silent upgrade failures (InstallPlan creation failed, CSV upgrade failed, etc.)

      Context

      • Step created by jfrancoa (Dec 2023), left company 2 years ago
      • OTA team confirmed they don't maintain this step (owned by cucushift/winc team)
      • OTA team uses CSV version verification for their own operator upgrades
      • Jenkins job already has proper verification

      Evidence of Best Practice

      1. OCP-43832 test has proper verification:
      2. optional-operators-ci-upgrade step in openshift/release verifies CSV version:
        • ci-operator/step-registry/optional-operators/ci/upgrade/optional-operators-ci-upgrade-commands.sh
        • Line 22: if [[ "$CSV" == "${OO_LATEST_CSV}" ]]; then
      3. Jenkins WMCO upgrade job has proper verification (non-official but proven pattern)

      Scope

      This enhancement is in the openshift/release repository and is independent from:
      - ✅ WINC-1484 (Skyler's 4.19→4.20 variant config using operator-sdk)
      - ✅ PR #73920 (BYOH provisioning support)

      This benefits all Prow QE upgrade periodic jobs across all platforms/versions that use the openshift-upgrade-qe-test-winc chain.

      Goal

      Bring Prow CI WMCO upgrade verification up to the same standard as Jenkins.

      Acceptance Criteria

      • Step captures WMCO CSV name/version before cluster upgrade
      • Step verifies CSV name changed after cluster upgrade (old ≠ new)
      • Step fails with clear error if CSV did not upgrade
      • Step verifies all Windows nodes have version annotations matching the new CSV version
      • Step fails with clear error listing any nodes with incorrect versions
      • On failure, step dumps subscription, InstallPlan, and CSV resources for troubleshooting
      • Existing periodic upgrade jobs work without modification
      • Works across all platforms (AWS, Azure, GCP, vSphere, Nutanix)

      Implementation Approach

      Repository: openshift/release

      Required Workflow:

      1. Pre-upgrade phase (before cluster upgrade starts):
        • Query WMCO subscription to get current CSV name
        • Query current CSV to get version
        • Save both to SHARED_DIR for later verification
        • Log pre-upgrade state for debugging
      2. Post-upgrade phase (after cluster upgrade completes):
        • Wait for CSV to reach Succeeded state (preserve existing behavior)
        • Wait for deployment to be Available (preserve existing behavior)
        • Query subscription to get new CSV name
        • Compare new CSV name with saved old CSV name
        • If CSV names are identical, fail with detailed error
        • Extract version from new CSV
        • Query all Windows nodes for version annotations
        • Verify all node annotations match new CSV version
        • If any node has wrong version, fail with node-specific error
        • Wait for nodes to be Ready (preserve existing behavior)
      3. Error reporting:
        • On CSV verification failure: dump subscription, InstallPlan, and CSV resources
        • On node verification failure: dump node details with version mismatches
        • Include clear error messages and troubleshooting hints

      Files Expected to Change:
      - ci-operator/step-registry/cucushift/winc/upgrade/cucushift-winc-upgrade-commands.sh (required)
      - Potentially: openshift-upgrade-qe-test-winc chain if using separate pre-step
      - Generated metadata files (updated via make update)

      OWNERS

      Current owners of ci-operator/step-registry/cucushift/winc/upgrade/:
      - Approvers: jianlinliu, gpei, yunjiang29
      - Reviewers: jfrancoa (left company), rrasouli

              Unassigned Unassigned
              rrasouli Aharon Rasouli
              None
              None
              None
              None
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated: