-
Bug
-
Resolution: Obsolete
-
Blocker
-
None
-
None
-
None
-
5
-
False
-
-
False
-
Release Note Not Required
-
-
-
CORENET Sprint 273, CORENET Sprint 274
-
Critical
-
10
Description of problem:
- Attempting an upgrade to 4.17 from 4.16 is blocked by a handler in the cluster Network Operator: https://github.com/openshift/cluster-network-operator/blob/2dc3099a8689a5df9797fe9c14257d7b06886741/pkg/controller/statusmanager/status_manager.go#L403C3-L422C4
- However, the detection and lockstate conditional is based explicitly on the value of `Spec.DefaultNetwork.Type` being defined as: `OpenShiftSDN`.
- If a customer requests an upgrade, and we block the rollout, and then (without clearing the upgrade request yaml state), they proceed to migrate to OVN using the limited-live (or offline) migration method - we will see that the spec.defaultNetwork.Type value is changed to: `OVNKubernetes` mid-migration.
- This update to the spec, removes the safeguard/blocker preventing the upgrade, and the cluster will begin to upgrade to 4.17.
- Cluster will upgrade all components excepting Network Operator, because the process of restarts/machine-config rollout and network teardown takes longer than the upgrade tasks do.
- This leads to a scenario in which OVNkube is up/defined, but so too is OpenShift SDN and the new 4.17 operator build of Network Operator is unable to complete the migration tasking because the apis are removed. (soft-locked).
Version-Release number of selected component (if applicable):
4.16 --> 4.17
How reproducible:
- Haven't replicated in the lab, but looking at the code, I expect very easily:
-
- Deploy cluster on 4.16 using SDN
- Request upgrade to 4.17
- Observe denial due to blocker code detecting OpenshiftSDN as spec.DefaultNetwork.Type
- Proceed with limited live migration
- Observe spec.DefaultNetwork.Type change to OVNKubernetes
- Observe cluster upgrade begin even though we're not fully migrated yet.
Actual results:
- Cluster degraded
Expected results:
- Cluster should not be allowed to upgrade until OVNKube is in place and OpenShift SDN is fully torn down (block on migration status as well).
Additional info:
- This should I think be fairly easy to fix with an additional spec check to ensure we aren't in migration state before allowing upgrade as a blocker, specifically on this version to ensure we have FINISHED the upgrade before we can move up.