-
Story
-
Resolution: Unresolved
-
Normal
-
None
-
None
-
None
-
False
-
-
False
-
None
-
None
-
None
-
None
-
None
As a platform engineer operating ARO HyperShift clusters, I want HostedControlPlane validation errors to report accurate reasons and clear messages instead of the generic InsufficientClusterCapabilities, so that triage is faster and more actionable.
Context
- An Azure HostedCluster created with scripts showed:
- lastTransitionTime: 2025-08-25T13:27:00Z
- message: "failed to create azure creds to verify resource group locations: failed to read credential file /mnt/certs/: read /mnt/certs/: is a directory"
- reason: InsufficientClusterCapabilities
- type: ValidHostedControlPlaneConfiguration
- The reason appears to originate from control-plane-operator/controllers/hostedcontrolplane/hostedcontrolplane_controller.go where condition.Reason is set to InsufficientClusterCapabilities for failures from validateConfigAndClusterCapabilities.
- References:
- Current controller line (approx): https://github.com/openshift/hypershift/blob/main/control-plane-operator/controllers/hostedcontrolplane/hostedcontrolplane_controller.go#L3405
- Historical use: https://github.com/openshift/hypershift/blob/861379bc985b0adbe9cdfdd3b2814ae60892af81/control-plane-operator/controllers/hostedcontrolplane/hostedcontrolplane_controller.go
- ARO overrides note: https://github.com/openshift/hypershift/pull/6685/files
Problem
- Configuration errors (e.g., credential path issues) are being surfaced with reason InsufficientClusterCapabilities, which is misleading and hampers debugging.
Proposed Direction
- Segregate reasons returned by validateConfigAndClusterCapabilities:
- Either return a reason value from the validator, or split into distinct validator functions that convey specific reasons.
- Default to InvalidConfigurationReason for configuration errors instead of InsufficientClusterCapabilities.
- Improve error messages for credential and path issues to be explicit and actionable.
- Update condition setting logic accordingly and add unit tests for each path.
Acceptance Criteria
- Given an Azure HCP with a missing or invalid credential mount/path, when the ValidHostedControlPlaneConfiguration condition is set, it should use InvalidConfigurationReason (or a more precise configuration-related reason) instead of InsufficientClusterCapabilities.
- Given each of the three failure paths currently returned by validateConfigAndClusterCapabilities, when a failure occurs, it should set a specific, accurate reason that distinguishes configuration errors from cluster capability issues.
- When the credential path /mnt/certs/ is a directory or otherwise unreadable, it should emit a condition message that clearly indicates misconfiguration and recommended next steps (e.g., check file vs directory, path correctness, mount contents).
- Unit tests cover the updated reason mapping for each failure path and validate messages using the "When ... it should ..." description style.
- No change to success path behavior; all existing e2e/integration tests continue to pass.
Notes
- Aligns with feedback from @Bryan and @alberto.lamela: segregate reasons; possibly let the function return the reason or split into separate functions.
- This issue is motivated by ARO/Azure but the fix should be platform-agnostic within the controller.