-
Bug
-
Resolution: Duplicate
-
Undefined
-
None
-
4.16
-
None
-
Quality / Stability / Reliability
-
False
-
-
None
-
Important
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
Description of problem
Although the cluster-network-operator prevents a 4.16 cluster from being upgraded if the CNI plugin is openshift-sdn, it allows a cluster upgrade to 4.17 to happen during the CNI plugin migration.
Version-Release number of selected component (if applicable)
Any 4.16.z.
How reproducible
Always
Steps to Reproduce
- We start with a 4.16 cluster using openshift-sdn.
- We trigger the 4.17 upgrade.
- 4.17 upgrade is "frozen" at Cluster Version Operator because Cluster Network Operator is still marked as Upgradeable=False (at this point, it would even be possible to cancel the upgrade, but let's not cancel it but let it remain just frozen).
- We start the migration process. So far, so good.
- At some point during the migration, it is required to set spec.networkType="OVNKubernetes" on network.config/cluster object, which sets spec.defaultNetwork.type="OVNKubernetes on the network.operator/cluster object. For example, in the offline migration procedure, it is done in the step 10 of the documentation (it is step 10 at the time I am reporting this).
- As the only check performed by CNO to prevent the upgrade is whether spec.defaultNetwork.type=="OpenShiftSDN" on the network.operator/cluster object, at this point the CNO no longer sets itself as non-upgradeable, so the upgrade to 4.17 starts, even when the migration is ongoing.
Actual results
Cluster is upgrading to 4.17 at the same time than the migration is not finished. The cluster can even reach the point where CNO is upgraded to 4.17 when the migration is still in progress.
At this point, the cluster can fully break and become very difficult or even impossible to recover.
Expected results
Cluster Network Operator should remain marking itself as non-upgradeable if there is any migration in progress (e.g. by checking if spec.migration is not null on network.operator/cluster object, which means there is some migration in progress).
Additional info
This is reproducible both with live migration and offline migration, as the cluster becomes upgradeable as soon as the OVNKubernetes network type is set (either manually during offline migration or automatically during live migration).
- duplicates
-
OCPBUGS-57354 OCP4.16.z: It is possible to accidentally trigger an upgrade to 4.17 during a migration from SDN --> OVN leading to soft-lock state
-
- POST
-