-
Bug
-
Resolution: Not a Bug
-
Major
-
None
-
4.16, 4.18
-
None
-
Quality / Stability / Reliability
-
False
-
-
None
-
None
-
None
-
None
-
None
-
Rejected
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
Description of problem:
In cpou upgrade scenario(from 4.16 to 4.18) with a paused infra mcp, the mco is degraded because it expects the controller version in the infra mc to be the 4.17 version.
Version-Release number of selected component (if applicable):
4.16 to 4.18 cpou upgrade
How reproducible:
Every time
Steps to Reproduce:
1. Install a 4.16 cluster (in my test it is Azure with IPSEC) 2. Install infra machinesets with 3 infra nodes, move some infra components to infra nodes like monitoring/ingress/registry 3. Do cpou upgrade from 4.16 to 4.18 3.1 Pause the worker and infra mcp 3.2 Start the upgrade to 4.17
Actual results:
The upgrade to 4.17 failed because the mco is degraded.
Expected results:
The master nodes are upgraded to 4.17. The worker and infra nodes should stay with 4.16 because they are paused.
Additional info:
In a test without infra mcp, the cpou upgrade works well. OTA functional qe has test case with a customer mcp, with worker lable, the cpou upgrade works well. In my failed test, the infra mcp does not have worker label, it only has a infra label.
Failed test job
oc adm upgrade status showed that one operator is degraded
= Control Plane = Assessment: Stalled Target Version: 4.17.0-0.nightly-2024-11-21-052346 (from 4.16.23) Completion: 97% (32 operators updated, 1 updating, 0 waiting) Duration: 4h4m (Est. Time Remaining: N/A; estimate duration was 1h35m) Operator Status: 32 Healthy, 1 Available but degraded
The degraded operator is mco. It stuck because it expects the 4.17 version of infra mc. However, infra mcp are paused thus will not upgrade to 4.17.
= Update Health = Message: Cluster Operator machine-config is degraded (RequiredPoolsFailed) Since: 58m9s Level: Error Impact: API Availability Reference: https://github.com/openshift/runbooks/blob/master/alerts/cluster-monitoring-operator/ClusterOperatorDegraded.md Resources: clusteroperators.config.openshift.io: machine-config Description: Unable to apply 4.17.0-0.nightly-2024-11-21-052346: error during syncRequiredMachineConfigPools: [context deadline exceeded, MachineConfigPool infra has not progressed to latest configuration: controller version mismatch for rendered-infra-6c171d9d397c09f3d4b0b81d46df2c05 expected 39e1cd3c3b04229c48988be1fb7f99b95856aff3 has 4bb3364914c4dbcdfcc08b0914f402cdd38f014f: <unknown>, retrying] Message: Cluster Version version is failing to proceed with the update (ClusterOperatorDegraded) Since: 3m58s Level: Warning Impact: Update Stalled Reference: https://github.com/openshift/runbooks/blob/master/alerts/cluster-monitoring-operator/ClusterOperatorDegraded.md Resources: clusterversions.config.openshift.io: version Description: Cluster operator machine-config is degraded Message: Outdated nodes in a paused pool 'infra' will not be updated Since: - Level: Warning Impact: Update Stalled Reference: https://docs.openshift.com/container-platform/latest/support/troubleshooting/troubleshooting-operator-issues.html#troubleshooting-disabling-autoreboot-mco_troubleshooting-operator-issues Resources: machineconfigpools.machineconfiguration.openshift.io: infra Description: Pool is paused, which stops all changes to the nodes in the pool, including updates. The nodes will not be updated until the pool is unpaused by the administrator. Message: Outdated nodes in a paused pool 'worker' will not be updated Since: - Level: Warning Impact: Update Stalled Reference: https://docs.openshift.com/container-platform/latest/support/troubleshooting/troubleshooting-operator-issues.html#troubleshooting-disabling-autoreboot-mco_troubleshooting-operator-issues Resources: machineconfigpools.machineconfiguration.openshift.io: worker Description: Pool is paused, which stops all changes to the nodes in the pool, including updates. The nodes will not be updated until the pool is unpaused by the administrator.
The infra mc
rendered-infra-37d5ea50ae2274a6829c836c74ef0ca7 39e1cd3c3b04229c48988be1fb7f99b95856aff3 3.4.0 3h15m rendered-infra-6c171d9d397c09f3d4b0b81d46df2c05 4bb3364914c4dbcdfcc08b0914f402cdd38f014f 3.4.0 5h2m
An possible workaround for customer to do the cpou upgrade with infra mcp:
I did a test with only worker mcp paused, infra mcp NOT paused. The infra mcp can be upgraded together with master mcp. And finally the cpou upgrade job was successful.
- is blocked by
-
MCO-1459 Impact statement request for OCPBUGS-45045 cpou upgrade with infra mcp paused failed as mco expects a newer version of infra mc
-
- Closed
-