-
Story
-
Resolution: Unresolved
-
Normal
-
None
-
None
-
Quality / Stability / Reliability
-
False
-
-
False
-
None
-
None
-
None
-
None
User Story
As a HyperShift release engineer, I want the Konflux release pipeline to only release HyperShift
Operator images that have passed all periodic compatibility tests, so that we further reduce the chance of Managed OpenShift breakages resulting from HyperShift Operator upgrades
Acceptance Criteria
- Konflux releaseplan configured with release.appstudio.openshift.io/auto-release: 'false'
- Periodic jobs from
CNTRLPLANE-1848and CNTRLPLANE-1854 use latest Konflux snapshot (not release) - Jobs retrieve snapshot using: {{oc get snapshot --sort-by=.metadata.creationTimestamp -l
pac.test.appstudio.openshift.io/original-prname=hypershift-operator-main-on-push}} (will need some read only credentials to the Konflux cluster) - When all periodic jobs pass, automation creates manual Release object for tested snapshot
- Failed periodic jobs mean no Konflux release creation
- Release process documented in team contrib/konflux documentation
- Notifications sent when releases are created
Technical Details
Konflux ReleasePlan Configuration
Update the HyperShift Operator releaseplan to disable auto-release:
apiVersion: appstudio.redhat.com/v1alpha1 kind: ReleasePlan metadata:name: name: hypershift-operator namespace: crt-redhat-acm-tenant labels: release.appstudio.openshift.io/auto-release: 'false' spec: target: rhtap-releng-tenant application: hypershift-operator # ... rest of releaseplan config
This prevents automatic releases when builds complete.
Periodic Job Snapshot Selection
Update all periodic jobs to use latest snapshot instead of latest release:
# Get latest snapshot for hypershift-operator-main-on-push SNAPSHOT=$(oc get snapshot \ --sort-by=.metadata.creationTimestamp \ -l pac.test.appstudio.openshift.io/original-prname=hypershift-operator-main-on-push \ -o jsonpath='{.items[-1].metadata.name}') # Extract HO image from snapshot HO_IMAGE=$(oc get snapshot $SNAPSHOT -o jsonpath='{.spec.components[?(@.name=="hypershift-operator-main")].containerImage}') # Use HO_IMAGE in test execution
Release Creation Workflow
Automated Release Creation (when all tests pass):
- Monitor periodic job results for latest snapshot
- When all required jobs pass (baseline, .0→latest, latest-1→latest for AWS and Azure):
- Retrieve tested snapshot name
- Create Release object referencing snapshot
- Konflux processes Release and publishes images
- Notify team of successful release This can use the https://konflux.pages.redhat.com/docs/users/releasing/tenant-release-pipelines.html#final-pipeline with something like https://github.com/konflux-ci/community-catalog/tree/development/pipelines/update-jira-issues-pipeline
Example Release Object:
apiVersion: appstudio.redhat.com/v1alpha1 kind: Release metadata: generateName: hypershift-operator- namespace: crt-redhat-acm-tenant spec: snapshot: <snapshot-name-that-passed-tests> releasePlan: hypershift-operator-progressive gracePeriodDays: 7
Reference: https://konflux.pages.redhat.com/docs/users/releasing/create-release.html#creating-a-release-object
Required Periodic Jobs for Release Gate
AWS Jobs (from CNTRLPLANE-1848):
- Baseline: periodic-ci-openshift-hypershift-release-4.Y-periodics-hcm-e2e-aws-ovn
- Upgrade .0→latest:
periodic-ci-openshift-hypershift-release-4.Y-periodics-hcm-upgrade-dot-zero-to-latest-aws-ovn - Upgrade latest-1→latest:
periodic-ci-openshift-hypershift-release-4.Y-periodics-hcm-upgrade-latest-1-to-latest-aws-ovn
Azure Jobs (from CNTRLPLANE-1854):
- Baseline: periodic-ci-openshift-hypershift-release-4.Y-periodics-hcm-e2e-azure
- Upgrade .0→latest:
periodic-ci-openshift-hypershift-release-4.Y-periodics-hcm-upgrade-dot-zero-to-latest-azure - Upgrade latest-1→latest:
periodic-ci-openshift-hypershift-release-4.Y-periodics-hcm-upgrade-latest-1-to-latest-azure
All jobs for at all supported OCP releases must pass before release (only pre-release / non-GA are allowed to fail).
Snapshot Tracking in Jobs
Each periodic job needs to:
- Record which snapshot it tested
- Report results tied to snapshot
- Make results query-able for release automation
Add to job configuration:
env: - name: KONFLUX_SNAPSHOT value: "<snapshot-from-query>" - name: REPORT_SNAPSHOT_RESULTS value: "true"
Benefits of This Approach
Quality Assurance:
- Test images on platform before integration environment deployment
- Every release has passed comprehensive compatibility tests
- Automatic blocking of broken builds
Traceability:
- Clear link between test results and releases
- Easy to identify which tests a release passed
- Audit trail of what was tested before release
Risk Reduction:
- Prevents shipping regressions to customers
- Catches compatibility issues before they reach production
- Aligns with managed service quality requirements
Implementation Phases
Phase 1: Update Periodic Jobs (After CNTRLPLANE-1851-1854)
- Modify jobs to use latest snapshot
- Add snapshot tracking to job results
- Validate jobs work with snapshots
Phase 2: Implement Release Automation
- Create automation to monitor job results
- Implement Release object creation
- Set up notifications
- Create runbooks for troubleshooting
- Update releaseplan with auto-release: false
- Verify no automatic releases occur
Manual Override Process
In emergency situations, allow manual releases without full test suite:
# Get latest snapshot SNAPSHOT=$(oc get snapshot --sort-by=.metadata.creationTimestamp \ -l pac.test.appstudio.openshift.io/original-prname=hypershift-operator-main-on-push \ -o jsonpath='{.items[-1].metadata.name}') # Create release manually (requires approval/justification) cat <<EOF | oc create -f - apiVersion: appstudio.redhat.com/v1alpha1 kind: Release metadata: generateName: hypershift-operator-emergency- namespace: <workspace-namespace> annotations: release.hypershift.io/emergency: "true" release.hypershift.io/justification: "Critical security fix" spec: snapshot: $SNAPSHOT releasePlan: hypershift-operator-release EOF
Monitoring and Alerts
Metrics to Track:
- Time between snapshot creation and release
- Percentage of snapshots that pass all tests
- Most common test failures blocking releases
- Emergency releases vs. normal releases
Alerts to Configure:
- Snapshot blocked for >24 hours (investigate test failures)
- No releases in >3 days (may indicate systemic issues)
- Emergency release created (requires review)
Dependencies
CNTRLPLANE-1851: Baseline jobs must existCNTRLPLANE-1852: Upgrade .0→latest jobs must existCNTRLPLANE-1853: Upgrade latest-1→latest jobs must exist- CNTRLPLANE-1854: Azure jobs must exist
- Konflux workspace access and permissions
- Access to modify releaseplan configuration
Success Metrics
- Zero automatic releases after auto-release disabled
- 100% of releases have passed compatibility tests
- <24 hour latency from successful tests to release
- <5% emergency override rate
- Clear audit trail for all releases
Documentation Requirements
Create documentation for:
- How release gating works (architecture diagram)
- How to check which snapshot is being tested
- How to view test results for a snapshot
- How to manually trigger release (emergency process)
- How to investigate blocked releases
- How to add/remove required tests from gate
Future Enhancements
- Progressive rollout based on test results (canary releases)
- Automatic rollback if post-release issues detected
- Integration with CVE scanning and security gates
- Cross-repository coordination (hypershift + operators)
- Customer-specific testing gates for managed services
- depends on
-
CNTRLPLANE-1848 Periodic Testing Infrastructure for HO/CPO Compatibility
-
- Closed
-
-
CNTRLPLANE-1854 Extend All Periodic Jobs to Azure Platform
-
- In Progress
-