Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-49912

During rollback from OVNKubernetes to OpenShiftSDN, after changing network type in Network.config.openshift.io causing cidr conflict errors

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Undefined Undefined
    • None
    • 4.12.z
    • None

       

      Description of problem:

      During rollback from  OVNKubernetes to OpenShiftSDN, after changing network type in Network.config.openshift.io causing cidr conflict errors.

      Logs:

      misalunk@misalunk-mac ansible-sdn-to-ovn-migration % oc get co
      NAME                                       VERSION   AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
      authentication                             4.12.71   False       False         True       39m     OAuthServerRouteEndpointAccessibleControllerAvailable: Get "https://oauth-openshift.apps.misalunk-migration37.devcluster.openshift.com/healthz": EOF
      baremetal                                  4.12.71   True        False         False      72m     
      cloud-controller-manager                   4.12.71   True        False         False      75m     
      cloud-credential                           4.12.71   True        False         False      76m     
      cluster-autoscaler                         4.12.71   True        False         False      72m     
      config-operator                            4.12.71   True        False         False      73m     
      console                                    4.12.71   False       False         False      39m     RouteHealthAvailable: failed to GET route (https://console-openshift-console.apps.misalunk-migration37.devcluster.openshift.com): Get "https://console-openshift-console.apps.misalunk-migration37.devcluster.openshift.com": EOF
      control-plane-machine-set                  4.12.71   True        False         False      71m     
      csi-snapshot-controller                    4.12.71   True        False         False      72m     
      dns                                        4.12.71   True        False         False      72m     
      etcd                                       4.12.71   True        False         False      71m     
      image-registry                             4.12.71   True        False         False      65m     
      ingress                                    4.12.71   True        False         True       64m     The "default" ingress controller reports Degraded=True: DegradedConditions: One or more other status conditions indicate a degraded state: CanaryChecksSucceeding=False (CanaryChecksRepetitiveFailures: Canary route checks for the default ingress controller are failing)
      insights                                   4.12.71   True        False         False      66m     
      kube-apiserver                             4.12.71   True        False         False      60m     
      kube-controller-manager                    4.12.71   True        False         False      69m     
      kube-scheduler                             4.12.71   True        False         False      69m     
      kube-storage-version-migrator              4.12.71   True        False         False      73m     
      machine-api                                4.12.71   True        False         False      66m     
      machine-approver                           4.12.71   True        False         False      72m     
      machine-config                             4.12.71   True        False         True       64m     Failed to resync 4.12.71 because: Required MachineConfigPool 'master' is paused and can not sync until it is unpaused
      marketplace                                4.12.71   True        False         False      72m     
      monitoring                                 4.12.71   True        False         False      64m     
      network                                    4.12.71   True        True          True       75m     DaemonSet "/openshift-sdn/sdn" rollout is not making progress - last change 2025-02-05T00:42:50Z
      node-tuning                                4.12.71   True        False         False      72m     
      openshift-apiserver                        4.12.71   True        False         False      60m     
      openshift-controller-manager               4.12.71   True        False         False      68m     
      openshift-samples                          4.12.71   True        False         False      65m     
      operator-lifecycle-manager                 4.12.71   True        False         False      72m     
      operator-lifecycle-manager-catalog         4.12.71   True        False         False      72m     
      operator-lifecycle-manager-packageserver   4.12.71   True        False         False      66m     
      service-ca                                 4.12.71   True        False         False      73m     
      storage                                    4.12.71   True        False         False      72m     
      
      
      
      
      misalunk@misalunk-mac ansible-sdn-to-ovn-migration % oc get pods -n openshift-sdn                     NAME                   READY   STATUS             RESTARTS         AGE sdn-controller-gtc5q   1/2     CrashLoopBackOff   12 (59s ago)     39m sdn-controller-qcsmq   2/2     Running            8 (17m ago)      39m sdn-controller-wrkmn   2/2     Running            12 (3m39s ago)   39m sdn-hsck5              1/2     Running            9 (6m37s ago)    39m sdn-l9pwp              1/2     Error              9 (7m6s ago)     39m sdn-lflwp              1/2     Running            9 (6m50s ago)    39m sdn-qz2fg              1/2     Running            9 (6m54s ago)    39m sdn-s76c6              1/2     Running            9 (6m52s ago)    39m sdn-xxjz4              1/2     Running            9 (6m53s ago)    39m
      misalunk@misalunk-mac ansible-sdn-to-ovn-migration % oc logs sdn-controller-gtc5q -n openshift-sdn
      Defaulted container "sdn-controller" out of: sdn-controller, kube-rbac-proxy
      I0205 01:14:15.286796       1 server.go:27] Starting HTTP metrics server
      I0205 01:14:15.286891       1 leaderelection.go:248] attempting to acquire leader lease openshift-sdn/openshift-network-controller...
      I0205 01:21:45.043815       1 leaderelection.go:258] successfully acquired lease openshift-sdn/openshift-network-controller
      I0205 01:21:45.043914       1 event.go:285] Event(v1.ObjectReference{Kind:"ConfigMap", Namespace:"openshift-sdn", Name:"openshift-network-controller", UID:"8f066780-17f1-41c1-9cf0-f902f68e3f9c", APIVersion:"v1", ResourceVersion:"49665", FieldPath:""}): type: 'Normal' reason: 'LeaderElection' ip-10-0-130-36 became leader
      I0205 01:21:45.043935       1 event.go:285] Event(v1.ObjectReference{Kind:"Lease", Namespace:"openshift-sdn", Name:"openshift-network-controller", UID:"c35cdcff-6ad5-454b-b97f-a5b765813da5", APIVersion:"coordination.k8s.io/v1", ResourceVersion:"49666", FieldPath:""}): type: 'Normal' reason: 'LeaderElection' ip-10-0-130-36 became leader
      I0205 01:21:45.044229       1 master.go:56] Initializing SDN master
      F0205 01:21:45.049989       1 network_controller.go:54] Error starting OpenShift Network Controller: cluster IP: 10.128.0.0 conflicts with host network: 10.129.0.0/23
      misalunk@misalunk-mac ansible-sdn-to-ovn-migration % 
      misalunk@misalunk-mac ansible-sdn-to-ovn-migration % 
      
      
      misalunk@misalunk-mac ansible-sdn-to-ovn-migration % oc logs sdn-l9pwp  -n openshift-sdn
      Defaulted container "sdn" out of: sdn, kube-rbac-proxy
      I0205 01:20:47.315954   79409 cmd.go:128] Reading proxy configuration from /config/kube-proxy-config.yaml
      I0205 01:20:47.316570   79409 feature_gate.go:245] feature gates: &{map[]}
      I0205 01:20:47.316608   79409 cmd.go:232] Watching config file /config/kube-proxy-config.yaml for changes
      I0205 01:20:47.316635   79409 cmd.go:232] Watching config file /config/..2025_02_05_00_42_50.793092302/kube-proxy-config.yaml for changes
      E0205 01:20:47.340084   79409 node.go:220] Local networks conflict with SDN; this will eventually cause problems: cluster IP: 10.128.0.0 conflicts with host network: 10.130.0.0/23
      I0205 01:20:47.340146   79409 node.go:153] Initializing SDN node "ip-10-0-161-148.ec2.internal" (10.0.161.148) of type "redhat/openshift-ovs-networkpolicy"
      I0205 01:20:47.340342   79409 cmd.go:174] Starting node networking (4.12.0-202412170201.p0.g9706f96.assembly.stream.el8-9706f96)
      I0205 01:20:47.340352   79409 node.go:315] Starting openshift-sdn network plugin
      W0205 01:20:47.345039   79409 subnets.go:156] Could not find an allocated subnet for node: ip-10-0-161-148.ec2.internal, Waiting...
      W0205 01:20:48.348359   79409 subnets.go:156] Could not find an allocated subnet for node: ip-10-0-161-148.ec2.internal, Waiting...
      W0205 01:20:49.851506   79409 subnets.go:156] Could not find an allocated subnet for node: ip-10-0-161-148.ec2.internal, Waiting...
      W0205 01:20:52.105557   79409 subnets.go:156] Could not find an allocated subnet for node: ip-10-0-161-148.ec2.internal, Waiting...
      W0205 01:20:55.485830   79409 subnets.go:156] Could not find an allocated subnet for node: ip-10-0-161-148.ec2.internal, Waiting...
      W0205 01:21:00.556531   79409 subnets.go:156] Could not find an allocated subnet for node: ip-10-0-161-148.ec2.internal, Waiting...
      W0205 01:21:08.155902   79409 subnets.go:156] Could not find an allocated subnet for node: ip-10-0-161-148.ec2.internal, Waiting...
      W0205 01:21:19.553740   79409 subnets.go:156] Could not find an allocated subnet for node: ip-10-0-161-148.ec2.internal, Waiting...
      W0205 01:21:36.653848   79409 subnets.go:156] Could not find an allocated subnet for node: ip-10-0-161-148.ec2.internal, Waiting...
      W0205 01:22:02.292709   79409 subnets.go:156] Could not find an allocated subnet for node: ip-10-0-161-148.ec2.internal, Waiting...
      W0205 01:22:40.744522   79409 subnets.go:156] Could not find an allocated subnet for node: ip-10-0-161-148.ec2.internal, Waiting...
      F0205 01:22:40.744544   79409 cmd.go:118] Failed to start sdn: failed to get subnet for this host: ip-10-0-161-148.ec2.internal, error: timed out waiting for the condition
      

       

       

      Version-Release number of selected component (if applicable): 4.12

      How reproducible: Always

      Steps to Reproduce:

      1. run all 6 steps mentioned in document.

      2.

      3.

      Actual results:

      Expected results:

      Additional info:

      Please fill in the following template while reporting a bug and provide as much relevant information as possible. Doing so will give us the best chance to find a prompt resolution.

      Affected Platforms:

      Is it an

      1. internal CI failure
      2. customer issue / SD
      3. internal RedHat testing failure

      If it is an internal RedHat testing failure:

      • Please share a kubeconfig or creds to a live cluster for the assignee to debug/troubleshoot along with reproducer steps (specially if it's a telco use case like ICNI, secondary bridges or BM+kubevirt).

      If it is a CI failure:

      • Did it happen in different CI lanes? If so please provide links to multiple failures with the same error instance
      • Did it happen in both sdn and ovn jobs? If so please provide links to multiple failures with the same error instance
      • Did it happen in other platforms (e.g. aws, azure, gcp, baremetal etc) ? If so please provide links to multiple failures with the same error instance
      • When did the failure start happening? Please provide the UTC timestamp of the networking outage window from a sample failure run
      • If it's a connectivity issue,
      • What is the srcNode, srcIP and srcNamespace and srcPodName?
      • What is the dstNode, dstIP and dstNamespace and dstPodName?
      • What is the traffic path? (examples: pod2pod? pod2external?, pod2svc? pod2Node? etc)

      If it is a customer / SD issue:

      • Provide enough information in the bug description that Engineering doesn't need to read the entire case history.
      • Don't presume that Engineering has access to Salesforce.
      • Do presume that Engineering will access attachments through supportshell.
      • Describe what each relevant attachment is intended to demonstrate (failed pods, log errors, OVS issues, etc).
      • Referring to the attached must-gather, sosreport or other attachment, please provide the following details:
        • If the issue is in a customer namespace then provide a namespace inspect.
        • If it is a connectivity issue:
          • What is the srcNode, srcNamespace, srcPodName and srcPodIP?
          • What is the dstNode, dstNamespace, dstPodName and dstPodIP?
          • What is the traffic path? (examples: pod2pod? pod2external?, pod2svc? pod2Node? etc)
          • Please provide the UTC timestamp networking outage window from must-gather
          • Please provide tcpdump pcaps taken during the outage filtered based on the above provided src/dst IPs
        • If it is not a connectivity issue:
          • Describe the steps taken so far to analyze the logs from networking components (cluster-network-operator, OVNK, SDN, openvswitch, ovs-configure etc) and the actual component where the issue was seen based on the attached must-gather. Please attach snippets of relevant logs around the window when problem has happened if any.
      • When showing the results from commands, include the entire command in the output.  
      • For OCPBUGS in which the issue has been identified, label with "sbr-triaged"
      • For OCPBUGS in which the issue has not been identified and needs Engineering help for root cause, label with "sbr-untriaged"
      • Do not set the priority, that is owned by Engineering and will be set when the bug is evaluated
      • Note: bugs that do not meet these minimum standards will be closed with label "SDN-Jira-template"
      • For guidance on using this template please see
        OCPBUGS Template Training for Networking  components

              rhn-support-misalunk Miheer Salunke
              rhn-support-misalunk Miheer Salunke
              Anurag Saxena Anurag Saxena
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated: