Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-12431

ovnkube-node POD is failing due to late assignment of IPv6 IP to primary interface in dual stack environment

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Not a Bug
    • Icon: Critical Critical
    • None
    • 4.12.z
    • Important
    • No
    • Rejected
    • False
    • Hide

      None

      Show
      None
    • Customer Escalated

      Description of problem:

      Upgrade is stuck due to network cluster operator. Network cluster operator is failing as couple of ovnkube-node PODs are failing with below error in ovnkube-node container.
      
      ~~~
      2023-04-21T15:07:31.178062995Z F0421 15:07:31.178047  181667 ovnkube.go:133] error waiting for node readiness: failed to set the node masquerade route to OVN: could not find node IPv6 address to configure OVN masquerade route, addresses: [{Type:InternalIP Address:<IPv4 IP>} {Type:Hostname Address:<hostname>}]
      
      ~~~
      
      This issue may come up if IPv6 IP assignment to the primary interface is delayed for some reason.  As a result IPv6 IP does not get written to 20-ndenet.conf created by nodeip-cofiguratio service and in turn we do not get IPv6 IP in status of node custom resource. I have followed below workarounds mentioned in bug[1].
      
      - Changed NM_ONLINE_TIMEOUT of NetworkManager-wait-online service to 300 second so that IPv6 Ip gets assigned to primary interface before starting nodeip-configuration service. NetworkManager-wait-online service is still failing after 5 minutes.
      - Created an environment file for kubelet to refer and hard coded both IPv4 and Ipv6 IP. After rebooting the node, I do not see IPv6 IP in status section of node custom resource.
      
      Next action plan:
      
      - Add 'may-fail=false' under IPv6 in nmconnection file of bond0. Could not apply this yet as machine config operator is in Degraded state for a different reason and the nmconnection file is being managed by machine config operator.
      
      I am not sure why nodeip-configuration service is not writing IPv6 IP to environment file 20-nodenet.conf even after 300 second delay was induced due to 'NM_ONLINE_TIMEOUT' in NetworkManager-wait-online service. I am also not sure why IPv6 IP is not showing in the status section even after mentioning both IPv4 and IPv6 Ip at 98-nodenet-override.conf in /etc/systemd/system/kubelet.service.d/ directory.
      
      [1] - https://issues.redhat.com/browse/OCPBUGS-6009 
      

      Version-Release number of selected component (if applicable):

      Openshift 4.12.4
      

      How reproducible:

      Not Sure
      

      Steps to Reproduce:

      1.
      2.
      3.
      

      Actual results:

      Upgrade is stuck due to this reason.
      

      Expected results:

      Upgrade should proceed
      

      Additional info:

      Will add in comments section
      

            bnemec@redhat.com Benjamin Nemec
            rhn-support-arghosh Arnab Ghosh
            Zhanqi Zhao Zhanqi Zhao
            Arnab Ghosh
            Votes:
            0 Vote for this issue
            Watchers:
            17 Start watching this issue

              Created:
              Updated:
              Resolved: