Description of problem:
On a fresh ACM 2.9.0 hub cluster with 2 managed clusters, when creating 2 Policies that each contain a ConfigurationPolicy in its template, with secrets to propagate to the set of managed clusters, for one of the Policies the secret template on the hub fails to expand.
This causes the policy to not be compliant on the managed cluster (as hub templates stay unexpanded: https://github.com/open-cluster-management-io/config-policy-controller/blob/a90147565fba40c750764b263709e63c9c8f0b44/controllers/configurationpolicy_controller.go#L964 )
The expansion failure seems to stem from the following processing by the grc-policy-propagator controller at the hub, and also looks like there are no future retries for the Policy resource:
2023-11-22T01:19:26.408Z error policy-propagator propagator/propagation.go:298 Failed to get/generate the policy encryption key {"policyName": "825ed4d67a69abcf4162519266f4579ea10019d", "policyNamespace": "openshift-operators", "cluster": "cluster2", "error": "failed to create the Secret cluster2/policy-encryption-key: secrets \"policy-encryption-key\" already exists"} open-cluster-management.io/governance-policy-propagator/controllers/propagator.(*ReplicatedPolicyReconciler).processTemplates /remote-source/app/controllers/propagator/propagation.go:298 open-cluster-management.io/governance-policy-propagator/controllers/propagator.(*ReplicatedPolicyReconciler).Reconcile /remote-source/app/controllers/propagator/replicatedpolicy_controller.go:243 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile /remote-source/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime@v0.15.2/pkg/internal/controller/controller.go:118 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler /remote-source/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime@v0.15.2/pkg/internal/controller/controller.go:314 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem /remote-source/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime@v0.15.2/pkg/internal/controller/controller.go:265 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2 /remote-source/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime@v0.15.2/pkg/internal/controller/controller.go:226
YAMLs:
The failing set:
================
The secret:
$ oc get secret -n openshift-operators 825ed4d67a69abcf4162519266f4579ea10019d -o yaml
apiVersion: v1
data:
AWS_ACCESS_KEY_ID: bWhPWHBzNkpXblpXRUNSRmk0VlE=
AWS_SECRET_ACCESS_KEY: T2wyYXB3TVNZMmlqb205ZzhHT3h2SmFFYVZXR05HRzBoVUhrb2R0Lw== notsecret
kind: Secret
metadata:
creationTimestamp: "2023-11-22T01:16:59Z"
name: 825ed4d67a69abcf4162519266f4579ea10019d
namespace: openshift-operators
The policy in the secret namespace:
$ oc get policy -n openshift-operators 825ed4d67a69abcf4162519266f4579ea10019d -o yaml apiVersion: policy.open-cluster-management.io/v1 kind: Policy metadata: annotations: policy.open-cluster-management.io/trigger-update: "149585" creationTimestamp: "2023-11-22T01:19:25Z" generation: 1 name: 825ed4d67a69abcf4162519266f4579ea10019d namespace: openshift-operators resourceVersion: "149799" uid: dd2310b2-6a7f-42b1-a821-1eb8ad4a9599 spec: disabled: false policy-templates: - objectDefinition: apiVersion: policy.open-cluster-management.io/v1 kind: ConfigurationPolicy metadata: creationTimestamp: null name: cfg-policy-825ed4d67a69abcf4162519266f4579ea10019d spec: evaluationInterval: {} namespaceSelector: {} object-templates: - complianceType: Musthave objectDefinition: apiVersion: v1 data: AWS_ACCESS_KEY_ID: '{{hub fromSecret "openshift-operators" "825ed4d67a69abcf4162519266f4579ea10019d" "AWS_ACCESS_KEY_ID" hub}}' AWS_SECRET_ACCESS_KEY: '{{hub fromSecret "openshift-operators" "825ed4d67a69abcf4162519266f4579ea10019d" "AWS_SECRET_ACCESS_KEY" hub}}' kind: Secret metadata: creationTimestamp: null name: 825ed4d67a69abcf4162519266f4579ea10019d namespace: openshift-dr-system remediationAction: Enforce severity: high status: {} remediationAction: Enforce status: compliant: NonCompliant placement: - placementBinding: plbinding-825ed4d67a69abcf4162519266f4579ea10019d placementRule: plrule-825ed4d67a69abcf4162519266f4579ea10019d status: - clustername: cluster1 clusternamespace: cluster1 compliant: NonCompliant - clustername: cluster2 clusternamespace: cluster2 compliant: NonCompliant
Policy in the managed cluster namespace (just cluster1 for terseness):
$ oc get policy -n cluster1 openshift-operators.825ed4d67a69abcf4162519266f4579ea10019d -o yaml apiVersion: policy.open-cluster-management.io/v1 kind: Policy metadata: annotations: argocd.argoproj.io/compare-options: IgnoreExtraneous creationTimestamp: "2023-11-22T01:19:26Z" generation: 1 labels: policy.open-cluster-management.io/cluster-name: cluster1 policy.open-cluster-management.io/cluster-namespace: cluster1 policy.open-cluster-management.io/root-policy: openshift-operators.825ed4d67a69abcf4162519266f4579ea10019d name: openshift-operators.825ed4d67a69abcf4162519266f4579ea10019d namespace: cluster1 resourceVersion: "149764" uid: 8af94b12-e922-4c05-b039-602fbd4ed577 spec: disabled: false policy-templates: - objectDefinition: apiVersion: policy.open-cluster-management.io/v1 kind: ConfigurationPolicy metadata: creationTimestamp: null name: cfg-policy-825ed4d67a69abcf4162519266f4579ea10019d spec: evaluationInterval: {} namespaceSelector: {} object-templates: - complianceType: Musthave objectDefinition: apiVersion: v1 data: AWS_ACCESS_KEY_ID: '{{hub fromSecret "openshift-operators" "825ed4d67a69abcf4162519266f4579ea10019d" "AWS_ACCESS_KEY_ID" hub}}' AWS_SECRET_ACCESS_KEY: '{{hub fromSecret "openshift-operators" "825ed4d67a69abcf4162519266f4579ea10019d" "AWS_SECRET_ACCESS_KEY" hub}}' kind: Secret metadata: creationTimestamp: null name: 825ed4d67a69abcf4162519266f4579ea10019d namespace: openshift-dr-system remediationAction: Enforce severity: high status: {} remediationAction: Enforce status: compliant: NonCompliant details: - compliant: NonCompliant history: - eventName: openshift-operators.825ed4d67a69abcf4162519266f4579ea10019d.1799cd371f3a559a lastTimestamp: "2023-11-22T01:19:30Z" message: NonCompliant; violation - Error occurred while processing hub-templates, check the policy events for more details. templateMeta: creationTimestamp: null name: cfg-policy-825ed4d67a69abcf4162519266f4579ea10019d
Working set:
============
The secret:
$ oc get secret -n openshift-operators a6bd33b0ff333553dce289cbbbb56976a789b8f -o yaml
apiVersion: v1
data:
AWS_ACCESS_KEY_ID: c0FhZTVZSm1jUDU2Yjh6bWxONnM=
AWS_SECRET_ACCESS_KEY: Q2ticXE3b0I0c1N1MnltZ2JGaGdlZjMvbjYvZ21GeGMvUmJSM1BDKw== notsecret
kind: Secret
metadata:
creationTimestamp: "2023-11-22T01:18:59Z"
name: a6bd33b0ff333553dce289cbbbb56976a789b8f
namespace: openshift-operators
The policy in the secret namespace:
$ oc get policy -n openshift-operators a6bd33b0ff333553dce289cbbbb56976a789b8f -o yaml apiVersion: policy.open-cluster-management.io/v1 kind: Policy metadata: annotations: policy.open-cluster-management.io/trigger-update: "149601" creationTimestamp: "2023-11-22T01:19:25Z" generation: 1 name: a6bd33b0ff333553dce289cbbbb56976a789b8f namespace: openshift-operators resourceVersion: "149810" uid: 34170ba3-268f-4389-9f7a-04e037ceb2c1 spec: disabled: false policy-templates: - objectDefinition: apiVersion: policy.open-cluster-management.io/v1 kind: ConfigurationPolicy metadata: creationTimestamp: null name: cfg-policy-a6bd33b0ff333553dce289cbbbb56976a789b8f spec: evaluationInterval: {} namespaceSelector: {} object-templates: - complianceType: Musthave objectDefinition: apiVersion: v1 data: AWS_ACCESS_KEY_ID: '{{hub fromSecret "openshift-operators" "a6bd33b0ff333553dce289cbbbb56976a789b8f" "AWS_ACCESS_KEY_ID" hub}}' AWS_SECRET_ACCESS_KEY: '{{hub fromSecret "openshift-operators" "a6bd33b0ff333553dce289cbbbb56976a789b8f" "AWS_SECRET_ACCESS_KEY" hub}}' kind: Secret metadata: creationTimestamp: null name: a6bd33b0ff333553dce289cbbbb56976a789b8f namespace: openshift-dr-system remediationAction: Enforce severity: high status: {} remediationAction: Enforce status: compliant: Compliant placement: - placementBinding: plbinding-a6bd33b0ff333553dce289cbbbb56976a789b8f placementRule: plrule-a6bd33b0ff333553dce289cbbbb56976a789b8f status: - clustername: cluster1 clusternamespace: cluster1 compliant: Compliant - clustername: cluster2 clusternamespace: cluster2 compliant: Compliant
Policy in the managed cluster namespace (just cluster1 for terseness):
$ oc get policy -n cluster1 openshift-operators.a6bd33b0ff333553dce289cbbbb56976a789b8f -o yaml apiVersion: policy.open-cluster-management.io/v1 kind: Policy metadata: annotations: argocd.argoproj.io/compare-options: IgnoreExtraneous policy.open-cluster-management.io/encryption-iv: JR9xl/ayfchlJ2nfnU5mbQ== creationTimestamp: "2023-11-22T01:19:26Z" generation: 1 labels: policy.open-cluster-management.io/cluster-name: cluster1 policy.open-cluster-management.io/cluster-namespace: cluster1 policy.open-cluster-management.io/root-policy: openshift-operators.a6bd33b0ff333553dce289cbbbb56976a789b8f name: openshift-operators.a6bd33b0ff333553dce289cbbbb56976a789b8f namespace: cluster1 resourceVersion: "149918" uid: b5d3fd5f-389d-4d64-93ea-ab2495b8e307 spec: disabled: false policy-templates: - objectDefinition: apiVersion: policy.open-cluster-management.io/v1 kind: ConfigurationPolicy metadata: annotations: policy.open-cluster-management.io/encryption-iv: JR9xl/ayfchlJ2nfnU5mbQ== creationTimestamp: null name: cfg-policy-a6bd33b0ff333553dce289cbbbb56976a789b8f spec: evaluationInterval: {} namespaceSelector: {} object-templates: - complianceType: Musthave objectDefinition: apiVersion: v1 data: AWS_ACCESS_KEY_ID: $ocm_encrypted:IH4L++ZUB24DrP/OjPCJUp+CQdAbq3qk2L3HgVFlJX0= AWS_SECRET_ACCESS_KEY: $ocm_encrypted:aWgrZXjWYR9iOLGBjyEvNIZnA8Of3+0xvoGvKIhpgpn7+UK8iWPW+WR6XjF8l8h+o3xG+Whl7TuA+TpHe1hxzw== kind: Secret metadata: creationTimestamp: null name: a6bd33b0ff333553dce289cbbbb56976a789b8f namespace: openshift-dr-system remediationAction: Enforce severity: high status: {} remediationAction: Enforce status: compliant: Compliant details: - compliant: Compliant history: - eventName: openshift-operators.a6bd33b0ff333553dce289cbbbb56976a789b8f.1799cd393c947321 lastTimestamp: "2023-11-22T01:19:40Z" message: Compliant; notification - secrets [a6bd33b0ff333553dce289cbbbb56976a789b8f] found as specified in namespace openshift-dr-system - eventName: openshift-operators.a6bd33b0ff333553dce289cbbbb56976a789b8f.1799cd3721fdc9c0 lastTimestamp: "2023-11-22T01:19:30Z" message: Compliant; notification - secrets [a6bd33b0ff333553dce289cbbbb56976a789b8f] was created successfully in namespace openshift-dr-system - eventName: openshift-operators.a6bd33b0ff333553dce289cbbbb56976a789b8f.1799cd372124500d lastTimestamp: "2023-11-22T01:19:30Z" message: NonCompliant; violation - secrets [a6bd33b0ff333553dce289cbbbb56976a789b8f] not found in namespace openshift-dr-system templateMeta: creationTimestamp: null name: cfg-policy-a6bd33b0ff333553dce289cbbbb56976a789b8f
NOTES:
- The secrets that are referenced in the hub template exist before the Policy is created
- The PlacementRule and PlacementBinding resources exist
- A future creation of the failed Policy (with a new name and amending the PlacementBinding to include the new Policy) works as desired
- Seems to occur for the initial Policies that are created in tandem and not for further Policies created
- Attached logs from the gcr-policy-propagator instance
Version-Release number of selected component (if applicable):
ACM 2.9.0 + OCP 4.14.0 + ODF 4.14.0
How reproducible:
Twice in 2 separate setups that use the recently released ACM 2.9.0 version.
Additional info:
- This prevents an initial setup of ODF-DR solution as the secrets are not transferred to the managed clusters
- This combination has been in use and tested for a couple of releases at present, IOW behaviour seems to be new in 2.9.0