Uploaded image for project: 'Red Hat Advanced Cluster Security'
  1. Red Hat Advanced Cluster Security
  2. ROX-33270

Replicas conflict between rhacs-operator and horizontal autoscaler

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Critical Critical
    • None
    • 4.10.0
    • Installation
    • None
    • False
    • Hide

      None

      Show
      None
    • False

      USER PROBLEM

      • After installing ACS 4.10.0-rc.3 on a fresh cluster, one replica pod of `scanner`, `scanner-v4-indexer` and `scanner-v4-matcher` is continually recreated:
        oc get pods -w
        NAME                                  READY   STATUS    RESTARTS      AGE
        central-86dff48df8-cmbfw              1/1     Running   0             23h
        central-db-8544b49d65-6mbg4           1/1     Running   0             23h
        config-controller-8594d69cc6-4glgr    1/1     Running   0             23h
        scanner-7fdb5bd945-5rj8m              0/1     Running   0             4s
        scanner-7fdb5bd945-8kp98              1/1     Running   0             23h
        scanner-7fdb5bd945-m4q9x              1/1     Running   0             23h
        scanner-db-59f4d5cbb9-hnfpd           1/1     Running   0             23h
        scanner-v4-db-8b45f7544-bh5jn         1/1     Running   0             23h
        scanner-v4-indexer-fc65fcdcc-8ww8w    1/1     Running   2 (23h ago)   23h
        scanner-v4-indexer-fc65fcdcc-fvjqs    1/1     Running   0             4s
        scanner-v4-indexer-fc65fcdcc-xqsq8    1/1     Running   2 (23h ago)   23h
        scanner-v4-matcher-5947898dff-kmf5c   1/1     Running   2 (23h ago)   23h
        scanner-v4-matcher-5947898dff-m5zrr   1/1     Running   2 (23h ago)   23h
        scanner-v4-matcher-5947898dff-nmwp8   1/1     Running   0             4s
        scanner-v4-matcher-5947898dff-nmwp8   1/1     Terminating   0             9s
        scanner-v4-matcher-5947898dff-nmwp8   1/1     Terminating   0             9s
        scanner-v4-indexer-fc65fcdcc-fvjqs    1/1     Terminating   0             9s
        scanner-v4-indexer-fc65fcdcc-fvjqs    1/1     Terminating   0             9s
        scanner-7fdb5bd945-5rj8m              0/1     Terminating   0             9s
        scanner-7fdb5bd945-5rj8m              0/1     Terminating   0             9s
        scanner-v4-matcher-5947898dff-nmwp8   0/1     Completed     0             9s

      CONDITIONS
      What conditions need to exist for a user to be affected? Is it everyone? Is it only those with a specific integration? Is it specific to someone with particular database content? etc.

      • pending

      ROOT CAUSE

      • After digging a bit into the issue there seems to be a conflict between hpa and rhacs-operator.

      The output of:

      $ oc get deploy scanner -o yaml --show-managed-fields

       

      alternates between:

        - apiVersion: apps/v1
          fieldsType: FieldsV1
          fieldsV1:
            f:spec:
              f:replicas: {}
          manager: kube-controller-manager
          operation: Update
          subresource: scale

       

      and:

      - apiVersion: apps/v1
          fieldsType: FieldsV1
          fieldsV1:
            f:spec:
              f:replicas: {}
          manager: rhacs-operator
          operation: Update
          time: "2026-02-20T13:46:53Z"

       

      From the events, it appears that horizontal autoscaler wants 2 instances:

      apiVersion: v1
      count: 6525
      eventTime: null
      firstTimestamp: "2026-02-19T10:17:18Z"
      involvedObject:
        apiVersion: autoscaling/v2
        kind: HorizontalPodAutoscaler
        name: scanner
        namespace: stackrox
        resourceVersion: "1348001"
        uid: 0d853ffb-e2fb-4271-a735-3ed496863fb7
      kind: Event
      lastTimestamp: "2026-02-20T13:42:32Z"
      message: 'New size: 2; reason: All metrics below target'
      metadata:
        creationTimestamp: "2026-02-19T10:17:18Z"
        name: scanner.18959e7156fc4015
        namespace: stackrox
        resourceVersion: "2689672"
        uid: 971090b2-a384-4065-af43-54622638948e
      reason: SuccessfulRescale
      reportingComponent: horizontal-pod-autoscaler
      reportingInstance: ""
      source:
        component: horizontal-pod-autoscaler
      type: Normal

       

      I tried to turn off rhacs-operator by scaling down its replicas to 0 and the problem disappears.

      FIX
      How was the bug fixed (this is more important if a workaround was implemented rather than an actual fix)?

      • the only workaround I found is scaling down rhacs-operator replicas to 0.

        1. rhacs-operator-controller-manager-85867cbbb9-khqjp.log
          8.97 MB
          Domenico Commisso
        2. rhacs-operator-controller-manager-68c7cff8df-7gf5t-manager.log.gz
          649 kB
          Marcin Owsiany
        3. kube-apiserver.tar
          136.60 MB
          Marcin Owsiany
        4. rhacs-operator-controller-manager-68c7cff8df-7gf5t-manager (1).log.gz
          833 kB
          Marcin Owsiany
        5. kube-apiserver-1.tar
          191.10 MB
          Marcin Owsiany

              mowsiany@redhat.com Marcin Owsiany
              rhn-support-dcommiss Domenico Commisso
              ACS Install
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated: