Loading...

XML

Word

Printable

Type: Bug
Resolution: Unresolved
Priority: Normal
Fix Version/s: None
Affects Version/s: 4.17.z
Component/s: HyperShift
Labels:
None

Activity Type:
None
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Story Points:
None
Severity:
Important
Regression:
Yes

Target Backport Versions:
None
Target Version:
None
Release Blocker:
None
Sprint:
None

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

PX Impact Score:

Release Note Status:
None
Release Note Type:
None
Release Note Text:
None

Escape Reason:
None
Escape Impact:
None
Corrective Measures:
None
SDLC stage when should've been found:
None

Description of problem:

    The olm-collect-profiles CronJob pods fail to access the Kubernetes API server with "dial tcp 172.30.0.1:443: i/o timeout" errors. The component's
  NeedsManagementKASAccess() method returns true [1], but the generated CronJob pod template is missing the required configuration.

Version-Release number of selected component (if applicable):

How reproducible:

This affects ALL hosted control planes on the management cluster. It's not clear why some hosted controlplanes have it and some do not.

Steps to Reproduce:

1.Observe olm-collect-profiles CronJob in a HyperShift hosted control plane namespace     

2.Wait for scheduled job execution or manually trigger:      kubectl create job --from=cronjob/olm-collect-profiles test -n <namespace>     

3.Check pod logs - observe "failed to get server groups: Get \"https://172.30.0.1:443/api\": dial tcp 172.30.0.1:443: i/o timeout"

Actual results:

    olm-collect-profiles pods fail with API connection timeout after ~30 seconds.

Expected results:

    The olm-collect-profiles CronJob pods should successfully connect to the Kubernetes API server and complete.

Additional info:

    The collect_profiles component.go [1] correctly declares NeedsManagementKASAccess() = true, but the HyperShift control-plane-operator does not propagate this to the pod template. Working pods from before October 28, 2025 had the label "hypershift.openshift.io/need-management-kas-access: true", but this is missing from pods created after that date.


  COMPARISON:
  - Working pod (Nov 20, 04:52): Has label "hypershift.openshift.io/need-management-kas-access: true"
  - Failing pod (Nov 21, 05:05): Missing this label
  - Both use same CronJob template
  - Same node, same namespace, different outcomes


  WORKAROUND:
  Manually add the label when creating jobs:
  kubectl create job --from=cronjob/olm-collect-profiles test -n <namespace>
  kubectl patch job test -n <namespace> --type=json 
  -p='[{"op":"add","path":"/spec/template/metadata/labels/hypershift.openshift.io~1need-management-kas-access","value":"true"}]'


  REGRESSION:
  Last successful scheduled run: October 28, 2025
  Last HyperShift update to CronJob: October 28, 2025 17:21:45Z (per managedFields)



  REFERENCES:
  [1] https://github.com/openshift/hypershift/blob/main/control-plane-operator/controllers/hostedcontrolplane/v2/olm/collect_profiles/component.go -
  NeedsManagementKASAccess() method
  [2] https://github.com/openshift/hypershift/blob/main/control-plane-operator/controllers/hostedcontrolplane/v2/assets/olm-collect-profiles/cronjob.yaml -
  CronJob template
  [3] https://github.com/openshift/hypershift/blob/879fcc8d3e451cbb695a5d6b9828a8478926ac6c/support/controlplane-component/defaults.go#L212-L215 -
  setDefaultOptions logic

Assignee:: Unassigned

Reporter:: Alice Hubenko

Need Info From:: None

Contributors:: None

QA Contact:: Yu Li

Doc Contact:: None

Votes:: 1 Vote for this issue

Watchers:: 4 Start watching this issue

Created:: 2025/11/21 11:04 PM

Updated:: 2025/11/25 6:24 PM

Details

Description

Attachments

Easy Agile Planning Poker

Activity

People

Dates