It has been observed for quite some times. It is more likely to occur in a patching/recreation scenario: the CR was created previously, but day2 operations like patch won't trigger the reconciliation loop.
For example, when updating fields in the SpireServer CR spec such as resources/affinity and other common configs, sometimes the operator does not propagate the updated values to the underlying StatefulSet as expected. Also the update does not trigger a reconciliation, so the changes in the CR spec are ignored.
STEP: Patching SpireServer object with resource specifications @ 07/28/25 15:19:54.923 STEP: Waiting for SPIRE Server StatefulSet rolling update to start @ 07/28/25 15:19:54.942 statefulset 'zero-trust-workload-identity-manager/spire-server' no rolling update in progress (current=spire-server-744fc6bfbd, update=spire-server-744fc6bfbd) statefulset 'zero-trust-workload-identity-manager/spire-server' no rolling update in progress (current=spire-server-744fc6bfbd, update=spire-server-744fc6bfbd) ... [timeout after 2mins]
In the operator pod log, there isn't anything skeptical: https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/pr-logs/pull/openshift_zero-trust-workload-identity-manager/30/pull-ci-openshift-zero-trust-workload-identity-manager-main-e2e-operator/1949849381674946560/artifacts/e2e-operator/gather-extra/artifacts/pods/zero-trust-workload-identity-manager_zero-trust-workload-identity-manager-controller-manager-ccbxbv6_manager.log
Additionally, if I manually delete the operator controller-manager pod to force a restart, everything seemingly back to normal, the reconciliation will start.
oc delete po -n zero-trust-workload-identity-manager zero-trust-workload-identity-manager-controller-manager-5crjdcz # or oc rollout restart deployment/zero-trust-workload-identity-manager-controller-manager -n zero-trust-workload-identity-manager
Expected behavior
Updating the operand CR (e.g., changing spec.resources) should trigger reconciliation and cause the operator to update the corresponding StatefulSet/Daemonset with the new values, without manual restart the operator.
Steps to reproduce
- Apply SpireServer/SpireAgent CR with nil resources spec.
export APP_DOMAIN=apps.$(oc get dns cluster -o jsonpath='{ .spec.baseDomain }') export CLUSTER_NAME=test01oc apply -f - <<EOF apiVersion: operator.openshift.io/v1alpha1 kind: SpireServer metadata: name: cluster spec: trustDomain: $APP_DOMAIN clusterName: $CLUSTER_NAME caSubject: commonName: $APP_DOMAIN country: "US" organization: "RH" persistence: type: pvc size: "2Gi" accessMode: ReadWriteOncePod datastore: databaseType: sqlite3 connectionString: "/run/spire/data/datastore.sqlite3" maxOpenConns: 100 maxIdleConns: 2 connMaxLifetime: 3600 EOF oc get po -l app.kubernetes.io/name=server -n zero-trust-workload-identity-manager NAME READY STATUS RESTARTS AGE spire-server-0 2/2 Running 1 (108s ago) 111s
- Update the CR's spec.resources. (or any non-required config)
oc patch SpireServer cluster --type=merge -p=" spec: resources: limits: cpu: 200m memory: 64Mi requests: cpu: 50m memory: 32Mi " # vice-vise, it you already spec.resources set, you can set it to nil and issue would still have chance to occur
- Observe that the StatefulSet remains unchanged.
oc get statefulset spire-server -n zero-trust-workload-identity-manager -o=jsonpath="{.spec.template.spec.containers[*].resources}"
oc get pod -l app.kubernetes.io/name=server -n zero-trust-workload-identity-manager -o yaml
- Check the operator logs – reconciliation is not triggered.
More context: The first time I noticed this is when testing CR recreation https://redhat-internal.slack.com/archives/C08TX9H7CCF/p1749219928264279?thread_ts=1749103562.892369&cid=C08TX9H7CCF; the second time is when debugging e2e test for CR common configs: patch the CR. It might be related to operator controller-manager cache issue.