Uploaded image for project: 'Red Hat Advanced Cluster Security'
  1. Red Hat Advanced Cluster Security
  2. ROX-27910

Rollback of SecuredCluster fails (e.g. when trying to apply invalid overlay)

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Major Major
    • None
    • None
    • OpenShift Operator
    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • False
    • 0

      Note for customer support: Kindly please do not add customer cases to this ticket.

      This is a genuine bug, but effectively harmless on its own, and never the root cause of any customer issue.

      It sometimes surfaces when tickled by a different problem, and causes confusion, so we want to fix it eventually.

      If you have a case which looks related, please create a separate ticket, and link to this one. We will investigate, find the root cause and when mitigated, then the issue described in this ticket will disappear as well for you.

      Note: full operator pod logs are crucial for finding the root cause of your issue.

       

      USER PROBLEM
      What is the user experiencing as a result of the bug? Include steps to reproduce.

      • Operator gets stuck in a rollback loop with the following message (reformatted for readability)
      • Apart from that, it would be good if we could dry-run the patches in the overlay somehow before proceeding, perhaps at least in some cases this would remove the need to roll back.
      rollback failed: failed to replace object: PersistentVolumeClaim "scanner-v4-db" is invalid: spec: Forbidden: spec is immutable after creation except resources.requests and volumeAttributesClassName for bound claims
        core.PersistentVolumeClaimSpec{
            AccessModes:      {"ReadWriteOnce"},
            Selector:         nil,
            Resources:        {Requests: {s"storage": {i: {...}, s: "50Gi", Format: "BinarySI"}}},
      -     VolumeName:       "pvc-a78c78d0-a692-47d6-b01a-be7f8d6ebd4a",
      +     VolumeName:       "",
      -     StorageClassName: &"standard-csi",
      +     StorageClassName: nil,
            VolumeMode:       &"Filesystem",
            DataSource:       nil,
            ... // 2 identical fields
        }
      : original upgrade error: cannot patch "scanner-v4-matcher-config" with kind ConfigMap:  "" is invalid: patch: Invalid value: "
      {
        "apiVersion": "v1",
        "data": {
          "config.yaml": {
            "matcher": {
              "vulnerabilities_url": "https://central.stackrox.svc/api/extensions/scannerdefinitions?version=dev"
            }
          }
        },
        "kind": "ConfigMap",
        "metadata": {
          "name": "scanner-v4-matcher-config",
          "namespace": "rhacs-operator",
          "uid": "e51db325-bf69-400f-82b3-e65594781642",
          "resourceVersion": "46129",
          "creationTimestamp": "2025-01-31T20:17:04Z",
          "labels": {
            "app.kubernetes.io/component": "scanner-v4",
            "app.kubernetes.io/instance": "stackrox-central-services",
            "app.kubernetes.io/managed-by": "Helm",
            "app.kubernetes.io/name": "stackrox",
            "app.kubernetes.io/part-of": "stackrox-central-services",
            "app.kubernetes.io/version": "4.6.1",
            "app.stackrox.io/managed-by": "operator",
            "helm.sh/chart": "stackrox-central-services-400.6.1"
          },
          "annotations": {
            "email": "support@stackrox.com",
            "meta.helm.sh/release-name": "stackrox-central-services",
            "meta.helm.sh/release-namespace": "rhacs-operator",
            "owner": "stackrox"
          },
          "ownerReferences": [
            {
              "apiVersion": "platform.stackrox.io/v1alpha1",
              "kind": "Central",
              "name": "stackrox-central-services",
              "uid": "058120da-b182-458d-aea4-08144a124b79",
              "controller": true,
              "blockOwnerDeletion": true
            }
          ],
          "managedFields": [
            {
              "manager": "rhacs-operator",
              "operation": "Update",
              "apiVersion": "v1",
              "time": "2025-01-31T20:31:38Z",
              "fieldsType": "FieldsV1",
              "fieldsV1": {
                "f:data": {
                  ".": {},
                  "f:config.yaml": {}
                },
                "f:metadata": {
                  "f:annotations": {
                    ".": {},
                    "f:email": {},
                    "f:meta.helm.sh/release-name": {},
                    "f:meta.helm.sh/release-namespace": {},
                    "f:owner": {}
                  },
                  "f:labels": {
                    ".": {},
                    "f:app.kubernetes.io/component": {},
                    "f:app.kubernetes.io/instance": {},
                    "f:app.kubernetes.io/managed-by": {},
                    "f:app.kubernetes.io/name": {},
                    "f:app.kubernetes.io/part-of": {},
                    "f:app.kubernetes.io/version": {},
                    "f:app.stackrox.io/managed-by": {},
                    "f:helm.sh/chart": {}
                  },
                  "f:ownerReferences": {
                    ".": {},
                    "k:{\"uid\":\"058120da-b182-458d-aea4-08144a124b79\"}": {}
                  }
                }
              }
            }
          ]
        }
      }": json: cannot unmarshal object into Go struct field ConfigMap.data of type string 

      CONDITIONS
      What conditions need to exist for a user to be affected? Is it everyone? Is it only those with a specific integration? Is it specific to someone with particular database content? etc.

      • Discovered when trying to change a configmap
      • From the message, it seems this happens when an invalid overlay (trying to treat a configmap data as an object) fails to apply - then during the rollback the operator tries to apply an invalid update to the scanner-v4-db PVC (removing volumeName and storageClassName).

      ROOT CAUSE
      What is the root cause of the bug?

      FIX
      How was the bug fixed (this is more important if a workaround was implemented rather than an actual fix)?

      • pending
      • Please make sure to also see the closely related bug which has a bigger impact.

              Unassigned Unassigned
              mowsiany@redhat.com Marcin Owsiany
              ACS Install
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated: