Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-65894

olm-collect-profiles CronJob fails with API timeout - missing hypershift.openshift.io/need-management-kas-access label

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Normal Normal
    • None
    • 4.17.z
    • HyperShift
    • None
    • None
    • False
    • Hide

      None

      Show
      None
    • None
    • Important
    • Yes
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Description of problem:

          The olm-collect-profiles CronJob pods fail to access the Kubernetes API server with "dial tcp 172.30.0.1:443: i/o timeout" errors. The component's
        NeedsManagementKASAccess() method returns true [1], but the generated CronJob pod template is missing the required configuration.

      Version-Release number of selected component (if applicable):

          

      How reproducible:

      This affects ALL hosted control planes on the management cluster. It's not clear why some hosted controlplanes have it and some do not.

      Steps to Reproduce:

      1.Observe olm-collect-profiles CronJob in a HyperShift hosted control plane namespace     
      
      2.Wait for scheduled job execution or manually trigger:      kubectl create job --from=cronjob/olm-collect-profiles test -n <namespace>     
      
      3.Check pod logs - observe "failed to get server groups: Get \"https://172.30.0.1:443/api\": dial tcp 172.30.0.1:443: i/o timeout"     

      Actual results:

          olm-collect-profiles pods fail with API connection timeout after ~30 seconds.

      Expected results:

          The olm-collect-profiles CronJob pods should successfully connect to the Kubernetes API server and complete.

      Additional info:

          The collect_profiles component.go [1] correctly declares NeedsManagementKASAccess() = true, but the HyperShift control-plane-operator does not propagate this to the pod template. Working pods from before October 28, 2025 had the label "hypershift.openshift.io/need-management-kas-access: true", but this is missing from pods created after that date.
      
      
        COMPARISON:
        - Working pod (Nov 20, 04:52): Has label "hypershift.openshift.io/need-management-kas-access: true"
        - Failing pod (Nov 21, 05:05): Missing this label
        - Both use same CronJob template
        - Same node, same namespace, different outcomes
      
      
        WORKAROUND:
        Manually add the label when creating jobs:
        kubectl create job --from=cronjob/olm-collect-profiles test -n <namespace>
        kubectl patch job test -n <namespace> --type=json 
        -p='[{"op":"add","path":"/spec/template/metadata/labels/hypershift.openshift.io~1need-management-kas-access","value":"true"}]'
      
      
        REGRESSION:
        Last successful scheduled run: October 28, 2025
        Last HyperShift update to CronJob: October 28, 2025 17:21:45Z (per managedFields)
      
      
      
        REFERENCES:
        [1] https://github.com/openshift/hypershift/blob/main/control-plane-operator/controllers/hostedcontrolplane/v2/olm/collect_profiles/component.go -
        NeedsManagementKASAccess() method
        [2] https://github.com/openshift/hypershift/blob/main/control-plane-operator/controllers/hostedcontrolplane/v2/assets/olm-collect-profiles/cronjob.yaml -
        CronJob template
        [3] https://github.com/openshift/hypershift/blob/879fcc8d3e451cbb695a5d6b9828a8478926ac6c/support/controlplane-component/defaults.go#L212-L215 -
        setDefaultOptions logic
      
      

              Unassigned Unassigned
              ahubenko Alice Hubenko
              None
              None
              Yu Li Yu Li
              None
              Votes:
              1 Vote for this issue
              Watchers:
              4 Start watching this issue

                Created:
                Updated: