-
Epic
-
Resolution: Unresolved
-
Major
-
None
-
None
-
Integrate HyperShift install with Cluster CAPI Operator CRD compatibility checker
-
To Do
-
Product / Portfolio Work
-
-
100% To Do, 0% In Progress, 0% Done
-
False
-
-
False
-
None
-
None
Epic Goal
Modify hypershift install to work as a CAPI CRD "adopting manager" on management clusters where the Cluster CAPI Operator (CCAPIO) is present. HyperShift must coordinate with CCAPIO's CRD compatibility checker (enhancement) to safely take over lifecycle management of shared CAPI CRDs.
Why is this important?
Management clusters that use CAPI for their own machine management (e.g., clusters migrating from MAPI to CAPI) will already have CAPI CRDs installed by CCAPIO. HyperShift also installs CAPI CRDs. Without coordination, these two actors would conflict – competing for ownership, potentially installing incompatible schemas, or breaking each other during upgrades. The CRD compatibility checker provides the mechanism to safely hand off CRD ownership, but HyperShift must be modified to participate in this protocol.
Work Items
1. Detect CCAPIO presence and existing CAPI CRDs (cmd/install/install.go:setupCRDs())
Currently, setupCRDs() only detects IPAM CRDs (ipaddressclaims.ipam.cluster.x-k8s.io, ipaddresses.ipam.cluster.x-k8s.io) and skips them if they exist. This logic must be generalized to detect all CAPI CRDs on the management cluster and determine whether CCAPIO is managing them. This includes core CAPI CRDs (clusters.cluster.x-k8s.io, machines.cluster.x-k8s.io, machinesets.cluster.x-k8s.io, machinedeployments.cluster.x-k8s.io) and provider-specific CRDs.
The detection should use support/capabilities/management_cluster_capabilities.go or a new utility to determine if the management cluster has CCAPIO installed.
2. Signal CRD adoption to CCAPIO
When CCAPIO is present, hypershift install must update CCAPIO's configuration to add the relevant CAPI CRDs to its UnmanagedAPIs list. This tells CCAPIO to stop managing those CRDs and allows HyperShift to take over. Per the enhancement: "Hypershift will assert ownership of the Machine CRD by adding machines.cluster.x-k8s.io to CCAPIO's UnmanagedAPIs."
Determine the full set of CRDs that must be marked as unmanaged and implement the configuration update.
3. Pre-flight CRD compatibility validation via dry-run
CCAPIO deploys ValidatingAdmissionWebhooks that automatically reject CRD updates which violate CRDCompatibilityRequirement resources. HyperShift does not need to duplicate the compatibility checking logic. Instead, it should use Kubernetes server-side dry-run to validate all CRDs before persisting any of them, preventing partial installs that leave the cluster in an inconsistent state.
The approach:
- Dry-run phase: Before applying any CRDs, call client.Patch() with the client.DryRunAll option for each CRD. This sends the request through the full admission chain – including CCAPIO's compatibility webhook – without persisting changes to etcd.
- Collect errors: If any CRD is rejected by the compatibility webhook, collect all failures across all CRDs rather than failing on the first one.
- Report or proceed: If any dry-run failed, report all incompatibilities with clear, actionable error messages and abort before touching any CRDs. If all passed, proceed with the real applies.
This approach relies on CCAPIO's webhook setting sideEffects: None (standard best practice for validating webhooks, required for dry-run requests to be sent to the webhook). Confirm this with the CCAPIO team.
Note: The install command runs as the invoking user (typically cluster-admin), so no additional RBAC is needed for the install-time operations (CCAPIO config updates, dry-run validation, CRD applies).
Note: HyperShift installs its own version of the CRDs, which already include conversion webhook configuration for any CRDs that serve multiple versions. If an existing CAPI consumer requires a conversion webhook that HyperShift's CRDs do not provide, the compatibility checker will reject the update during the dry-run phase. No special conversion webhook handling is needed beyond what HyperShift already ships.
4. Handle webhook namespace scoping
Per the enhancement, the adopting manager must configure validating/mutating webhooks with appropriate namespace and object label selectors. Ensure HyperShift's webhook configurations for CAPI resources are scoped correctly to avoid conflicting with webhooks run by other CAPI consumers.
5. Maintain backward compatibility
The install must continue to work on management clusters where CCAPIO is NOT present (the current default). The new detection and adoption logic should be conditional:
- CCAPIO not present: install CRDs as today (skip only IPAM CRDs if they exist)
- CCAPIO present: execute the adoption flow (signal UnmanagedAPIs, dry-run validate all CRDs, apply CRDs if compatible)
6. Update Helm chart generation (cmd/install/install_helm.go)
The Helm rendering path calls hyperShiftOperatorManifests() with a nil client (no cluster access). Add Helm values for CCAPIO integration options so that Helm-based installations can also be configured for CRD adoption.
7. Update CRD labeling (cmd/install/assets/assets.go)
The capiResources map labels CRDs with cluster.x-k8s.io/v1beta1 version annotations. This labeling scheme may need to be updated or extended to coordinate with CCAPIO's version tracking mechanism (storage version declarations, transport ConfigMap annotations).
Scenarios
- Fresh install on a CAPI-enabled management cluster – HyperShift detects CCAPIO, signals adoption, dry-run validates all CRDs, then applies them
- Upgrade on a CAPI-enabled management cluster – HyperShift dry-run validates upgraded CRDs against existing compatibility requirements before applying any
- Install on a non-CAPI management cluster – HyperShift installs CRDs as today with no CCAPIO interaction
- Incompatible CRDs – Dry-run phase catches all incompatibilities across all CRDs; HyperShift reports them and aborts without modifying any CRDs on the cluster
Acceptance Criteria
- hypershift install succeeds on a management cluster with CCAPIO and existing CAPI CRDs
- hypershift install correctly signals CRD adoption via CCAPIO's UnmanagedAPIs
- When CRDs are incompatible, dry-run catches all failures and installation aborts without modifying any CRDs, leaving the cluster in a consistent state
- Incompatibility errors are clear and actionable, identifying which CRDs failed and why
- Existing CAPI controllers on the management cluster continue to function after HyperShift adopts CRD management
- hypershift install on management clusters without CCAPIO is unaffected
- CI tests cover both CCAPIO-present and CCAPIO-absent scenarios
- CI - MUST be running successfully with tests automated
- Release Technical Enablement - Provide necessary release enablement details and documents
Dependencies (internal and external)
- Cluster CAPI Operator CRD compatibility checker enhancement (PR 1845)
- CRDCompatibilityRequirement CRD and admission webhooks deployed by CCAPIO
- CCAPIO's ValidatingAdmissionWebhook must set sideEffects: None for dry-run validation to work
- CNTRLPLANE-1706 (related epic under OCPSTRAT-2955) – coordinate scope to avoid duplication
Key Files
| File | Changes |
|---|---|
| cmd/install/install.go | setupCRDs(): CCAPIO detection, adoption signaling; apply(): dry-run pre-flight phase and compatibility error handling |
| cmd/install/assets/assets.go | CRD labeling updates for CCAPIO coordination |
| cmd/install/install_helm.go | New Helm values for CCAPIO integration |
| support/capabilities/management_cluster_capabilities.go | CCAPIO capability detection |
Done Checklist
- CI - CI is running, tests are automated and merged.
- Release Enablement <link to Feature Enablement Presentation>
- DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
- DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
- DEV - Downstream build attached to advisory: <link to errata>
- QE - Test plans in Polarion: <link or reference to Polarion>
- QE - Automated tests merged: <link or reference to automated tests>
- DOC - Downstream documentation merged: <link to meaningful PR>