-
Bug
-
Resolution: Unresolved
-
Normal
-
None
-
4.20.z
-
None
-
False
-
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
Description of problem:
There is a synchronization failure between the Management Cluster ConfigMaps and the HostedCluster worker nodes when managing the additionalTrustBundle. The system fails in two distinct ways: In-place updates: Modifying the data of a ConfigMap already referenced by the HostedCluster does not trigger a reconciliation or a MachineConfig rollout. Reference updates with stale data: When switching the additionalTrustBundle reference to a different ConfigMap that was recently modified, the system applies the original/cached version of that ConfigMap's data rather than the updated values.
Version-Release number of selected component (if applicable):
HCP 4.20
MCE 2.10
How reproducible:
100%
Steps to Reproduce:
Initial Setup (Baseline) To establish the environment, three ConfigMaps were created and verified to work correctly upon their initial assignment: CM Creation: ca1 (data: myCA1.pem) ca2 (data: myCA2.pem) ca3 (data: myCA3.pem) Baseline Verification: The hostedcluster.spec.additionalTrustBundle was updated sequentially with each CM name. In these initial runs, the worker nodes correctly received the files at /etc/pki/ca-trust/source/anchors/openshift-config-user-ca-bundle.crt. Scenario 1: In-place Update Failure 1. Ensure the HostedCluster is currently using ca3 in spec.additionalTrustBundle. 2. Update the data in ca3 to contain myCA1.pem: oc set data cm/ca3 --from-file ca-bundle.crt=myCA1.pem 3. Observed Result: No machine-config-daemon pods are created or restarted on worker nodes. The CA bundle on the nodes remains the old myCA3.pem. 4. Expected Result: The HyperShift operator should watch the referenced ConfigMap. Upon data change, it should trigger a rollout to update the trust bundle on workers. Scenario 2: Stale Data on Reference Change 1. Update the data of ca1 (which is not currently active in the HC spec) from myCA1.pem to myCA2.pem. oc set data cm/ca1 --from-file ca-bundle.crt=myCA2.pem 2. Patch the HostedCluster CR to switch the additionalTrustBundle to ca1. 3. Observed Result: The machine-config-daemon pods roll out (because the HC spec changed). However, the CA bundle applied to the nodes is the original myCA1.pem, not the updated myCA2.pem. 4. Expected Result: The system should fetch the latest version of the ConfigMap from the API server before generating the MachineConfig, ensuring the worker nodes receive myCA2.pem.
Actual results:
For each scenario observed result is updated above
Expected results:
For each scenario expected result is updated above
Additional info:
It appears the HyperShift operator lacks a Watch on the objects referenced by additionalTrustBundle. Furthermore, the Machine Config Server within the HCP seems to be caching ConfigMap contents and failing to invalidate that cache when the underlying resource changes. This prevents automated CA rotation. Users are forced to create entirely new ConfigMaps with unique names for every certificate update to bypass the caching/sync issues.