Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-46367

Built-in transit switch subnet "100.88.0.0/16" overlaps service subnet "100.88.0.0/16" even though InternalTransitSwitchSubnet is configured

XMLWordPrintable

    • None
    • False
    • Hide

      None

      Show
      None

      Description of problem:

      This bug is very similar to https://issues.redhat.com/browse/OCPBUGS-39209 but with the internalTransitSwitchSubnet instead of the internalJoinSubnet as observed there. 

      Attempting to Migrate from OpenShiftSDN to OVNKubernetes but experiencing the below Error once the Limited Live Migration is started.

      + exec /usr/bin/ovnkube --init-ovnkube-controller ip-10-0-2-22.us-east-2.compute.internal --init-node ip-10-0-2-22.us-east-2.compute.internal --config-file=/run/ovnkube-config/ovnkube.conf --ovn-empty-lb-events --loglevel 4 --inactivity-probe=180000 --gateway-mode shared --gateway-interface br-ex --metrics-bind-address 127.0.0.1:29103 --ovn-metrics-bind-address 127.0.0.1:29105 --metrics-enable-pprof --metrics-enable-config-duration --export-ovs-metrics --disable-snat-multiple-gws --enable-multi-network --enable-admin-network-policy --enable-multicast --zone ip-10-0-2-22.us-east-2.compute.internal --enable-interconnect --acl-logging-rate-limit 20 --disable-forwarding --bootstrap-kubeconfig=/var/lib/kubelet/kubeconfig --cert-dir=/etc/ovn/ovnkube-node-certs --cert-duration=24h
      I1211 20:19:09.046885   69994 config.go:2193] Parsed config file /run/ovnkube-config/ovnkube.conf
      I1211 20:19:09.046963   69994 config.go:2194] Parsed config: {Default:{MTU:8901 RoutableMTU:0 ConntrackZone:64000 HostMasqConntrackZone:0 OVNMasqConntrackZone:0 HostNodePortConntrackZone:0 ReassemblyConntrackZone:0 EncapType:geneve EncapIP: EncapPort:6081 InactivityProbe:100000 OpenFlowProbe:180 OfctrlWaitBeforeClear:0 MonitorAll:true OVSDBTxnTimeout:1m40s LFlowCacheEnable:true LFlowCacheLimit:0 LFlowCacheLimitKb:1048576 RawClusterSubnets:100.84.0.0/14/24 ClusterSubnets:[] EnableUDPAggregation:true Zone:global} Logging:{File: CNIFile: LibovsdbFile:/var/log/ovnkube/libovsdb.log Level:4 LogFileMaxSize:100 LogFileMaxBackups:5 LogFileMaxAge:0 ACLLoggingRateLimit:20} Monitoring:{RawNetFlowTargets: RawSFlowTargets: RawIPFIXTargets: NetFlowTargets:[] SFlowTargets:[] IPFIXTargets:[]} IPFIX:{Sampling:400 CacheActiveTimeout:60 CacheMaxFlows:0} CNI:{ConfDir:/etc/cni/net.d Plugin:ovn-k8s-cni-overlay} OVNKubernetesFeature:{EnableAdminNetworkPolicy:true EnableEgressIP:true EgressIPReachabiltyTotalTimeout:1 EnableEgressFirewall:true EnableEgressQoS:true EnableEgressService:true EgressIPNodeHealthCheckPort:9107 EnableMultiNetwork:true EnableMultiNetworkPolicy:false EnableStatelessNetPol:false EnableInterconnect:false EnableMultiExternalGateway:true EnablePersistentIPs:false EnableDNSNameResolver:false EnableServiceTemplateSupport:false} Kubernetes:{BootstrapKubeconfig: CertDir: CertDuration:10m0s Kubeconfig: CACert: CAData:[] APIServer:https://api-int.4.14.38-202412110935.sandbox2564.opentlc.com:6443 Token: TokenFile: CompatServiceCIDR: RawServiceCIDRs:100.88.0.0/16 ServiceCIDRs:[] OVNConfigNamespace:openshift-ovn-kubernetes OVNEmptyLbEvents:false PodIP: RawNoHostSubnetNodes:migration.network.openshift.io/plugin= NoHostSubnetNodes:<nil> HostNetworkNamespace:openshift-host-network PlatformType:AWS HealthzBindAddress:0.0.0.0:10256 CompatMetricsBindAddress: CompatOVNMetricsBindAddress: CompatMetricsEnablePprof:false DNSServiceNamespace:openshift-dns DNSServiceName:dns-default} Metrics:{BindAddress: OVNMetricsBindAddress: ExportOVSMetrics:false EnablePprof:false NodeServerPrivKey: NodeServerCert: EnableConfigDuration:false EnableScaleMetrics:false} OvnNorth:{Address: PrivKey: Cert: CACert: CertCommonName: Scheme: ElectionTimer:0 northbound:false exec:<nil>} OvnSouth:{Address: PrivKey: Cert: CACert: CertCommonName: Scheme: ElectionTimer:0 northbound:false exec:<nil>} Gateway:{Mode:shared Interface: EgressGWInterface: NextHop: VLANID:0 NodeportEnable:true DisableSNATMultipleGWs:false V4JoinSubnet:100.64.0.0/16 V6JoinSubnet:fd98::/64 V4MasqueradeSubnet:169.254.169.0/29 V6MasqueradeSubnet:fd69::/125 MasqueradeIPs:{V4OVNMasqueradeIP:169.254.169.1 V6OVNMasqueradeIP:fd69::1 V4HostMasqueradeIP:169.254.169.2 V6HostMasqueradeIP:fd69::2 V4HostETPLocalMasqueradeIP:169.254.169.3 V6HostETPLocalMasqueradeIP:fd69::3 V4DummyNextHopMasqueradeIP:169.254.169.4 V6DummyNextHopMasqueradeIP:fd69::4 V4OVNServiceHairpinMasqueradeIP:169.254.169.5 V6OVNServiceHairpinMasqueradeIP:fd69::5} DisablePacketMTUCheck:false RouterSubnet: SingleNode:false DisableForwarding:false AllowNoUplink:false} MasterHA:{ElectionLeaseDuration:137 ElectionRenewDeadline:107 ElectionRetryPeriod:26} ClusterMgrHA:{ElectionLeaseDuration:137 ElectionRenewDeadline:107 ElectionRetryPeriod:26} HybridOverlay:{Enabled:true RawClusterSubnets: ClusterSubnets:[] VXLANPort:4789} OvnKubeNode:{Mode:full DPResourceDeviceIdsMap:map[] MgmtPortNetdev: MgmtPortDPResourceName:} ClusterManager:{V4TransitSwitchSubnet:100.88.0.0/16 V6TransitSwitchSubnet:fd97::/64}}
      F1211 20:19:09.047517   69994 ovnkube.go:136] illegal network configuration: transit switch subnet "100.88.0.0/16" overlaps service subnet "100.88.0.0/16" 

      The OpenShift Container Platform 4 - Cluster has been installed with the below configuration and therefore has a conflict because of the serviceNetwork with the Transit Switch Subnet of OVNKubernetes.

      $ oc get cm -n kube-system cluster-config-v1 -o yaml
      apiVersion: v1
      data:
        install-config: |
          additionalTrustBundlePolicy: Proxyonly
          apiVersion: v1
          baseDomain: sandbox2564.opentlc.com
          compute:
          - architecture: amd64
            hyperthreading: Enabled
            name: worker
            platform:
              aws:
                metadataService: {}
                rootVolume:
                  iops: 0
                  size: 0
                  type: ""
                type: m6i.2xlarge
            replicas: 3
          controlPlane:
            architecture: amd64
            hyperthreading: Enabled
            name: master
            platform: {}
            replicas: 3
          metadata:
            creationTimestamp: null
            name: 4.14.38-202412110935
          networking:
            clusterNetwork:
            - cidr: 100.84.0.0/14
              hostPrefix: 24
            machineNetwork:
            - cidr: 10.0.0.0/16
            networkType: OpenShiftSDN
            serviceNetwork:
            - 100.88.0.0/16
          platform:
            aws:
              region: us-east-2
          publish: External
          pullSecret: ""
      

      So following the procedure, the below steps were executed but still the problem is being reported.

      oc patch network.operator.openshift.io cluster --type='merge' -p='{"spec":{"defaultNetwork":{"ovnKubernetesConfig":{"ipv4":{"internalTransitSwitchSubnet": "100.99.0.0/16"}}}}}' 

      Checking whether change was applied and one can see it being there/configured.

      $ oc get network.operator cluster -o yaml
      apiVersion: operator.openshift.io/v1
      kind: Network
      metadata:
        creationTimestamp: "2024-12-11T14:46:17Z"
        generation: 2041
        name: cluster
        resourceVersion: "918564"
        uid: a07818f7-63de-49dd-99ef-1c739a966e2a
      spec:
        clusterNetwork:
        - cidr: 100.84.0.0/14
          hostPrefix: 24
        defaultNetwork:
          openshiftSDNConfig:
            enableUnidling: true
            mode: NetworkPolicy
            mtu: 8951
            vxlanPort: 4789
          ovnKubernetesConfig:
            egressIPConfig: {}
            gatewayConfig:
              ipv4: {}
              ipv6: {}
              routingViaHost: false
            genevePort: 6081
            ipsecConfig:
              mode: Disabled
            ipv4:
              internalTransitSwitchSubnet: 100.99.0.0/16
            mtu: 8901
            policyAuditConfig:
              destination: "null"
              maxFileSize: 50
              maxLogFiles: 5
              rateLimit: 20
              syslogFacility: local0
          type: OVNKubernetes
        deployKubeProxy: false
        disableMultiNetwork: false
        disableNetworkDiagnostics: false
        kubeProxyConfig:
          bindAddress: 0.0.0.0
        logLevel: Normal
        managementState: Managed
        migration:
          mode: Live
          networkType: OVNKubernetes
        observedConfig: null
        operatorLogLevel: Normal
        serviceNetwork:
        - 100.88.0.0/16
        unsupportedConfigOverrides: null
        useMultiNetworkPolicy: false
      

      Following the above the Limited Live Migration is being triggered, which then suddently stops because of the Error shown.

      oc patch Network.config.openshift.io cluster --type='merge' --patch '{"metadata":{"annotations":{"network.openshift.io/network-type-migration":""}},"spec":{"networkType":"OVNKubernetes"}}' 

      Version-Release number of selected component (if applicable):
      OpenShift Container Platform 4.16.25

      How reproducible:
      Have only tested once with live migration, but have seen this twice in 2 attempts with offline migration as well. 

      Steps to Reproduce:
      1. Install OpenShift Container Platform 4.14.38 with OpenShiftSDN, the configuration shown above and then update to OpenShift Container Platform 4.16.25.
      2. Change internalTransitSwitchSubnet to prevent a conflict with the Transit Switch Subnet of OVNKubernetes (oc patch network.operator.openshift.io cluster --type='merge' -p='{"spec":{"defaultNetwork":{"ovnKubernetesConfig":{"ipv4":

      {"internalTransitSwitchSubnet": "100.99.0.0/16"}

      }}}}'
      )

      3. Initiate the Limited Live Migration running oc patch Network.config.openshift.io cluster --type='merge' --patch '{"metadata":{"annotations":{"network.openshift.io/network-type-migration":""}},"spec":{"networkType":"OVNKubernetes"}}'
      4. Check the logs of ovnkube-node using oc logs ovnkube-node-XXXXX -c ovnkube-controller

      Actual results:

      + exec /usr/bin/ovnkube --init-ovnkube-controller ip-10-0-2-22.us-east-2.compute.internal --init-node ip-10-0-2-22.us-east-2.compute.internal --config-file=/run/ovnkube-config/ovnkube.conf --ovn-empty-lb-events --loglevel 4 --inactivity-probe=180000 --gateway-mode shared --gateway-interface br-ex --metrics-bind-address 127.0.0.1:29103 --ovn-metrics-bind-address 127.0.0.1:29105 --metrics-enable-pprof --metrics-enable-config-duration --export-ovs-metrics --disable-snat-multiple-gws --enable-multi-network --enable-admin-network-policy --enable-multicast --zone ip-10-0-2-22.us-east-2.compute.internal --enable-interconnect --acl-logging-rate-limit 20 --disable-forwarding --bootstrap-kubeconfig=/var/lib/kubelet/kubeconfig --cert-dir=/etc/ovn/ovnkube-node-certs --cert-duration=24h
      I1211 20:19:09.046885   69994 config.go:2193] Parsed config file /run/ovnkube-config/ovnkube.conf
      I1211 20:19:09.046963   69994 config.go:2194] Parsed config: {Default:{MTU:8901 RoutableMTU:0 ConntrackZone:64000 HostMasqConntrackZone:0 OVNMasqConntrackZone:0 HostNodePortConntrackZone:0 ReassemblyConntrackZone:0 EncapType:geneve EncapIP: EncapPort:6081 InactivityProbe:100000 OpenFlowProbe:180 OfctrlWaitBeforeClear:0 MonitorAll:true OVSDBTxnTimeout:1m40s LFlowCacheEnable:true LFlowCacheLimit:0 LFlowCacheLimitKb:1048576 RawClusterSubnets:100.84.0.0/14/24 ClusterSubnets:[] EnableUDPAggregation:true Zone:global} Logging:{File: CNIFile: LibovsdbFile:/var/log/ovnkube/libovsdb.log Level:4 LogFileMaxSize:100 LogFileMaxBackups:5 LogFileMaxAge:0 ACLLoggingRateLimit:20} Monitoring:{RawNetFlowTargets: RawSFlowTargets: RawIPFIXTargets: NetFlowTargets:[] SFlowTargets:[] IPFIXTargets:[]} IPFIX:{Sampling:400 CacheActiveTimeout:60 CacheMaxFlows:0} CNI:{ConfDir:/etc/cni/net.d Plugin:ovn-k8s-cni-overlay} OVNKubernetesFeature:{EnableAdminNetworkPolicy:true EnableEgressIP:true EgressIPReachabiltyTotalTimeout:1 EnableEgressFirewall:true EnableEgressQoS:true EnableEgressService:true EgressIPNodeHealthCheckPort:9107 EnableMultiNetwork:true EnableMultiNetworkPolicy:false EnableStatelessNetPol:false EnableInterconnect:false EnableMultiExternalGateway:true EnablePersistentIPs:false EnableDNSNameResolver:false EnableServiceTemplateSupport:false} Kubernetes:{BootstrapKubeconfig: CertDir: CertDuration:10m0s Kubeconfig: CACert: CAData:[] APIServer:https://api-int.4.14.38-202412110935.sandbox2564.opentlc.com:6443 Token: TokenFile: CompatServiceCIDR: RawServiceCIDRs:100.88.0.0/16 ServiceCIDRs:[] OVNConfigNamespace:openshift-ovn-kubernetes OVNEmptyLbEvents:false PodIP: RawNoHostSubnetNodes:migration.network.openshift.io/plugin= NoHostSubnetNodes:<nil> HostNetworkNamespace:openshift-host-network PlatformType:AWS HealthzBindAddress:0.0.0.0:10256 CompatMetricsBindAddress: CompatOVNMetricsBindAddress: CompatMetricsEnablePprof:false DNSServiceNamespace:openshift-dns DNSServiceName:dns-default} Metrics:{BindAddress: OVNMetricsBindAddress: ExportOVSMetrics:false EnablePprof:false NodeServerPrivKey: NodeServerCert: EnableConfigDuration:false EnableScaleMetrics:false} OvnNorth:{Address: PrivKey: Cert: CACert: CertCommonName: Scheme: ElectionTimer:0 northbound:false exec:<nil>} OvnSouth:{Address: PrivKey: Cert: CACert: CertCommonName: Scheme: ElectionTimer:0 northbound:false exec:<nil>} Gateway:{Mode:shared Interface: EgressGWInterface: NextHop: VLANID:0 NodeportEnable:true DisableSNATMultipleGWs:false V4JoinSubnet:100.64.0.0/16 V6JoinSubnet:fd98::/64 V4MasqueradeSubnet:169.254.169.0/29 V6MasqueradeSubnet:fd69::/125 MasqueradeIPs:{V4OVNMasqueradeIP:169.254.169.1 V6OVNMasqueradeIP:fd69::1 V4HostMasqueradeIP:169.254.169.2 V6HostMasqueradeIP:fd69::2 V4HostETPLocalMasqueradeIP:169.254.169.3 V6HostETPLocalMasqueradeIP:fd69::3 V4DummyNextHopMasqueradeIP:169.254.169.4 V6DummyNextHopMasqueradeIP:fd69::4 V4OVNServiceHairpinMasqueradeIP:169.254.169.5 V6OVNServiceHairpinMasqueradeIP:fd69::5} DisablePacketMTUCheck:false RouterSubnet: SingleNode:false DisableForwarding:false AllowNoUplink:false} MasterHA:{ElectionLeaseDuration:137 ElectionRenewDeadline:107 ElectionRetryPeriod:26} ClusterMgrHA:{ElectionLeaseDuration:137 ElectionRenewDeadline:107 ElectionRetryPeriod:26} HybridOverlay:{Enabled:true RawClusterSubnets: ClusterSubnets:[] VXLANPort:4789} OvnKubeNode:{Mode:full DPResourceDeviceIdsMap:map[] MgmtPortNetdev: MgmtPortDPResourceName:} ClusterManager:{V4TransitSwitchSubnet:100.88.0.0/16 V6TransitSwitchSubnet:fd97::/64}}
      F1211 20:19:09.047517   69994 ovnkube.go:136] illegal network configuration: transit switch subnet "100.88.0.0/16" overlaps service subnet "100.88.0.0/16" 

      Expected results:
      OVNKubernetes Limited Live Migration to recognize the change applied for internalTransitSwitchSubnet and don't report any CIDR/Subnet overlap during the OVNKubernetes Limited Live Migration

      Additional info:
      N/A

      Affected Platforms:
      OpenShift Container Platform 4.16.25 on AWS

              bbennett@redhat.com Ben Bennett
              rh-ee-dcoronel David Coronel
              Anurag Saxena Anurag Saxena
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

                Created:
                Updated:
                Resolved: