-
Story
-
Resolution: Unresolved
-
Undefined
-
None
-
None
-
None
-
False
-
-
False
-
None
-
None
-
None
-
None
As a cluster consumer, I want the HostedControlPlane to not be marked Available until all aggregated APIServices are ready, so that I can rely on the Available condition meaning all OpenShift APIs are functional.
Problem
In the HCCO (hosted cluster config operator) resources.go, all three APIService groups are reconciled before their backing Services and Endpoints:
- OpenShift API Server (lines 508-525): APIServices → Service → Endpoints
- OAuth API Server (lines 527-545): APIServices → Service → Endpoints
- OLM PackageServer (lines 1936-1972 in reconcileOLM): APIService → Service → Endpoints
This ordering creates a race condition: the Kubernetes API aggregator may pick up an APIService before its backend Service and Endpoints exist, causing transient 503 errors on aggregated API groups (e.g., apps.openshift.io, route.openshift.io, packages.operators.coreos.com).
Additionally, the HostedControlPlaneAvailable condition does not gate on these APIServices being ready. The HCP can be marked Available while aggregated APIs are still unreachable.
Affected APIServices
| Group | APIService Name | Source |
|---|---|---|
| apps.openshift.io | v1.apps.openshift.io | OpenShift API Server |
| authorization.openshift.io | v1.authorization.openshift.io | OpenShift API Server |
| build.openshift.io | v1.build.openshift.io | OpenShift API Server |
| image.openshift.io | v1.image.openshift.io | OpenShift API Server |
| quota.openshift.io | v1.quota.openshift.io | OpenShift API Server |
| route.openshift.io | v1.route.openshift.io | OpenShift API Server |
| security.openshift.io | v1.security.openshift.io | OpenShift API Server |
| template.openshift.io | v1.template.openshift.io | OpenShift API Server |
| project.openshift.io | v1.project.openshift.io | OpenShift API Server |
| oauth.openshift.io | v1.oauth.openshift.io | OAuth API Server (conditional) |
| user.openshift.io | v1.user.openshift.io | OAuth API Server (conditional) |
| packages.operators.coreos.com | v1.packages.operators.coreos.com | OLM PackageServer |
Acceptance Criteria
- Test that the reconciliation ordering in resources.go creates the backing Service and Endpoints before the APIService for all three groups (OpenShift API Server, OAuth API Server, OLM PackageServer)
- Test that a new AggregatedAPIServicesAvailable condition is set on the HostedControlPlane by the HCCO after reconciling all APIServices
- Test that the condition is True with reason AsExpected when all expected APIServices have Available=True in their status
- Test that the condition is False with reason AggregatedAPIServicesNotAvailable when any expected APIService is missing or not Available, and the message lists the unavailable APIServices
- Test that the condition is False with reason ReconcileError when the guest cluster client fails to retrieve an APIService (non-NotFound error)
- Verify that when OAuth is disabled, only the 9 OpenShift API Server groups and the PackageServer are checked (not oauth/user)
- Verify that when OAuth is enabled, all 12 APIServices are checked
- Test that the HostedControlPlaneAvailable condition cannot transition to True until AggregatedAPIServicesAvailable exists and is True (condition absence blocks availability)
- Test that once the HCP is already Available, a transient AggregatedAPIServicesAvailable=False does not regress availability (latch behavior via !alreadyAvailable guard)
- Test that the AggregatedAPIServicesAvailable condition message is propagated to the HostedControlPlaneAvailable condition message when it blocks availability
Implementation Notes
1. Fix reconciliation ordering (HCCO resources.go)
For each of the three APIService groups, reorder from APIService → Service → Endpoints to:
Service (guest cluster)
Endpoints (requires CP service ClusterIP)
APIService
2. New condition type and reason (API package)
Add to api/hypershift/v1beta1/hostedcluster_conditions.go:
- AggregatedAPIServicesAvailable ConditionType = "AggregatedAPIServicesAvailable"
- AggregatedAPIServicesNotAvailableReason = "AggregatedAPIServicesNotAvailable"
This condition is an HCP implementation detail and is not bubbled up to the HostedCluster level or ExpectedHCConditions.
3. Condition function (HCCO resources.go)
Add reconcileAggregatedAPIServicesAvailableCondition following the reconcileControlPlaneDataPlaneConnectivityConditions pattern:
- Uses r.client (guest cluster) to Get each expected apiregistrationv1.APIService
- Checks for apiregistrationv1.Available == apiregistrationv1.ConditionTrue
- Patches HCP status via r.cpClient.Status().Patch()
- Call site: after reconcileOLM (~line 674) so all three APIService groups have been reconciled
4. Availability gate (HCP controller)
In hostedcontrolplane_controller.go availability switch:
- Look up AggregatedAPIServicesAvailable condition from HCP status
- Add case: !alreadyAvailable && (apiServicesCondition == nil || apiServicesCondition.Status == metav1.ConditionFalse)
- Place after the componentsNotAvailableMsg case and before default
5. Unit tests
- resources_test.go: Table-driven test for the new condition function covering all scenarios (all available, some missing, some unavailable, OAuth on/off, client errors)
- hostedcontrolplane_controller_test.go: Test cases for the availability gate switch behavior (condition missing, False, True, latch)
Files to Modify
- api/hypershift/v1beta1/hostedcluster_conditions.go
- control-plane-operator/hostedclusterconfigoperator/controllers/resources/resources.go
- control-plane-operator/controllers/hostedcontrolplane/hostedcontrolplane_controller.go
- control-plane-operator/hostedclusterconfigoperator/controllers/resources/resources_test.go
- control-plane-operator/controllers/hostedcontrolplane/hostedcontrolplane_controller_test.go