Uploaded image for project: 'Red Hat Advanced Cluster Security'
  1. Red Hat Advanced Cluster Security
  2. ROX-33465

ACS Operator thinks that it runs on an non-Openshift cluster during the OCP upgrade

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Undefined Undefined
    • None
    • 4.9.3
    • OpenShift Operator
    • None
    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • False

      Description

      The OpenShift autosense logic in the operator is based on the presence of "project.openshift.io" in the Kube API. When an OCP cluster is upgraded, openshift-apiserver experiences downtime, making its API unavailable. Then the Kube API server cannot discover it:

      I0304 11:02:44.508440      19 handler.go:288] Adding GroupVersion project.openshift.io v1 to ResourceManager
      Error updating APIService "v1.project.openshift.io" with err: failed to download v1.project.openshift.io: failed to retrieve openAPI spec, http error: ResponseCode: 503, Body: service unavailable
      loading OpenAPI spec for "v1.project.openshift.io" failed with: failed to download v1.project.openshift.io: failed to retrieve openAPI spec, http error: ResponseCode: 503, Body: error trying to reach service: dial tcp 10.130.0.91:8443: i/o timeout
      I0304 11:03:15.335322      19 controller.go:109] OpenAPI AggregationController: action for item v1.project.openshift.io: Rate Limited Requeue.
      E0304 11:03:15.348305      19 controller.go:113] "Unhandled Error" err="loading OpenAPI spec for \"v1.project.openshift.io\" failed with: Error, could not get list of group versions for APIService" logger="UnhandledError"
      I0304 11:03:15.350743      19 controller.go:126] OpenAPI AggregationController: action for item v1.project.openshift.io: Rate Limited Requeue.
      I0304 11:03:44.514299      19 handler.go:288] Adding GroupVersion project.openshift.io v1 to ResourceManager
      

      If the ACS operator installs or upgrades Central during this time, it will assume it's running on a non-OpenShift cluster because the `project.openshift.io` API is unavailable. It is filtered when calling the /apis endpoint. The operator will only reconcile Central correctly during the next iteration, either after a CR change or within a 1-hour period.

      {"level":"info","ts":"2026-03-04T11:03:20Z","logger":"controllers.Central","msg":"Release upgraded","central":{"name":"central","namespace":"rhacs-abcdef"},"name":"central","version":13}
      

      As a consequence, the SCC Role will not be created (or deleted in case of an upgrade) and the pod will fail to start because of the SCC violation:

      - lastTransitionTime: "2026-03-04T11:03:20Z"                                              
        lastUpdateTime: "2026-03-04T11:03:20Z"                                                  
        message: 'pods "central-557c774f48-" is forbidden: unable to validate against           
          any security context constraint: [provider "anyuid": Forbidden: not usable by ...
      

      This is observed in ACSCS. OCP version 4.20.14. I was able to reproduce it on an infra cluster.

      Steps to reproduce

      1. Make sure the api is present

      kubectl api-resources --api-group=project.openshift.io
      NAME              SHORTNAMES   APIVERSION                NAMESPACED   KIND
      projectrequests                project.openshift.io/v1   false        ProjectRequest
      projects                       project.openshift.io/v1   false        Project
      

      2. Patch apiserver to scale it down

      oc patch deployment apiserver -n openshift-apiserver --type=strategic -p '          
        spec:
          replicas: 0
          template:
            spec:
              nodeSelector:
                node-role.kubernetes.io/control-plane: ""
        '
      

      3. Check the api presence again

      kubectl api-resources --api-group=project.openshift.io --v=6
      I0305 10:31:53.479973   62545 loader.go:402] Config loaded from file:  /Users/ykovalev/.kube/config
      I0305 10:31:53.481504   62545 envvar.go:172] "Feature gate default state" feature="InformerResourceVersion" enabled=false
      I0305 10:31:53.481513   62545 envvar.go:172] "Feature gate default state" feature="WatchListClient" enabled=false
      I0305 10:31:53.481516   62545 envvar.go:172] "Feature gate default state" feature="ClientsAllowCBOR" enabled=false
      I0305 10:31:53.481519   62545 envvar.go:172] "Feature gate default state" feature="ClientsPreferCBOR" enabled=false
      I0305 10:31:53.817431   62545 round_trippers.go:560] GET https://api.yk-03-04-instan.ljaa.p1.openshiftapps.com:6443/api?timeout=32s 200 OK in 333 milliseconds
      I0305 10:31:53.924841   62545 round_trippers.go:560] GET https://api.yk-03-04-instan.ljaa.p1.openshiftapps.com:6443/apis?timeout=32s 200 OK in 104 milliseconds
      NAME   SHORTNAMES   APIVERSION   NAMESPACED   KIND
      

      Now the table is empty

      3. See the kubeapiserver logs

      E0305 09:31:51.620689 13 controller.go:102] "Unhandled Error" err=<
      loading OpenAPI spec for "v1.route.openshift.io" failed with: failed to download v1.route.openshift.io: failed to retrieve openAPI spec, http error: ResponseCode: 503, Body: service unavailable
      , Header: map[Content-Type:[text/plain; charset=utf-8] X-Content-Type-Options:[nosniff]]
      > logger="UnhandledError"
      I0305 09:31:51.621963 13 controller.go:109] OpenAPI AggregationController: action for item v1.route.openshift.io: Rate Limited Requeue.
      

              Unassigned Unassigned
              ykovalev@redhat.com Yury Kovalev
              ACS Install
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated: