Uploaded image for project: 'Red Hat OpenShift Control Planes'
  1. Red Hat OpenShift Control Planes
  2. CNTRLPLANE-1852

Create Periodic Upgrade Jobs for .0 to Latest Z-Stream

XMLWordPrintable

    • Icon: Story Story
    • Resolution: Done
    • Icon: Major Major
    • None
    • None
    • HyperShift
    • None
    • None
    • None

      User Story

      As a managed service operator, I want periodic jobs that validate clusters can upgrade from CPO version 4.Y.0 to 4.Y.latest without triggering NodePool rollouts, so that I can plan customer upgrades safely from initial minor releases.

      Acceptance Criteria

      • Periodic jobs created for each supported OCP minor version (4.16+)
      • Jobs run existing TestUpgradeControlPlane test with .0 → latest parameters
      • Jobs run daily
      • Tests validate MachineDeployment generation remains unchanged (no NodePool rollout)
      • Tests validate control plane successfully upgrades
      • Tests validate cluster remains functional post-upgrade
      • Tests validate no crashing pods
      • Failed upgrades provide detailed diagnostics (logs, events, resource states)

      Technical Details

      Existing Test Being Used

      TestUpgradeControlPlane (test/e2e/control_plane_upgrade_test.go:16) already:

      • Creates cluster with --e2e.previous-release-image
      • Upgrades to --e2e.latest-release-image
      • Validates MachineDeployment.Generation == 1 (no rollout)
      • Validates EnsureNoCrashingPods
      • Validates cluster functionality

      We just need to configure periodic jobs to run this test with the right version parameters.

      CI Operator Config

      • Path: ci-operator/config/openshift/hypershift/openshift-hypershift-release-4.Y__periodics-hcm-upgrade.yaml
      • Workflow: hypershift-aws-e2e-upgrade or create new workflow
      • Environment variables:
      • PREVIOUS_RELEASE_IMAGE: 4.Y.0 release pullspec
      • LATEST_RELEASE_IMAGE: 4.Y.latest release pullspec

      Periodic Job

      • Name: periodic-ci-openshift-hypershift-release-4.Y-periodics-hcm-upgrade-dot-zero-to-latest-aws-ovn
      • Interval: Daily (0 2 * * *)
      • Timeout: 2 hours
      • Target: Run TestUpgradeControlPlane

      Test Execution

      The job will:
      1. Install HO (Konflux image from release-4.Y branch)
      2. Create cluster with CPO 4.Y.0 (via PREVIOUS_RELEASE_IMAGE)
      3. Wait for cluster ready
      4. Trigger upgrade to CPO 4.Y.latest (via LATEST_RELEASE_IMAGE)
      5. Validate upgrade completes without NodePool rollout
      6. Cleanup

      All validation logic is already in TestUpgradeControlPlane:

      • Line 74: e2eutil.EnsureMachineDeploymentGeneration(t, ctx, mgtClient, hostedCluster, 1)
      • Line 73: e2eutil.EnsureNoCrashingPods(t, ctx, mgtClient, hostedCluster)
      • Line 72: e2eutil.EnsureNodeCountMatchesNodePoolReplicas(...)

      Document Version Update Process (should really be automated)

      When new Z-stream is released (e.g., 4.20.16):
      1. Update LATEST_RELEASE_IMAGE in job configuration
      2. PREVIOUS_RELEASE_IMAGE stays at .0
      3. Submit PR to openshift/release
      4. Run pj-rehearse to validate
      5. Monitor for any new upgrade issues

      Diagnostic Collection on Failure

      TestUpgradeControlPlane already collects diagnostics via the test framework. Additional artifacts from periodic job:

      • Job logs
      • Cluster resources (captured by framework)
      • Must-gather equivalent

      Success Metrics

      • <5% false failure rate
      • Upgrade completes within 30 minutes
      • All validation checks pass (built into test)
      • Clear diagnostics available on failure
      • Results visible in CI dashboards

      Reference

      • Existing test: test/e2e/control_plane_upgrade_test.go:16
      • Test validates: test/e2e/util/util.go:1007 (EnsureMachineDeploymentGeneration)

              sjenning Seth Jennings
              asegurap1@redhat.com Antoni Segura Puimedon
              None
              None
              None
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated:
                Resolved: