Uploaded image for project: 'RHEL'
  1. RHEL
  2. RHEL-71960

Libreswan fails to establish IKE SA during reboot scenario

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Major Major
    • None
    • rhel-9.4
    • libreswan
    • None
    • No
    • Important
    • rhel-sst-security-crypto
    • ssg_security
    • None
    • False
    • Hide

      None

      Show
      None
    • None
    • None
    • None
    • None
    • x86_64
    • None

      What were you trying to do that didn't work?

      Two nodes are configured with IPsec connections in transport mode on udp port 6081 and traffic is all working fine. But when both nodes are rebooted almost at the same time, both came up at the same time, trying to start pluto service, parses /etc/ipsec.d/openshift.conf and trying to reestablish the IPsec connection again, but it fails for more than two minutes until a script does ipsec auto start <conn-name>. Looks like libreswan needs some fixing to IKE SA establishment when both peers are trying at the same time. This is causing a major problem for OCP cluster node reboot scenarios.

      What is the impact of this issue to you?

      It causes OCP east west traffic is broken for an intermediate period, produces major disruptive events and IPsec CI broken in OpenShift.

      Please provide the package NVR for which the bug is seen:

       

      sh-5.1# cat /etc/os-release 
      NAME="Red Hat Enterprise Linux CoreOS"
      ID="rhcos"
      ID_LIKE="rhel fedora"
      VERSION="416.94.202412112347-0"
      VERSION_ID="4.16"
      VARIANT="CoreOS"
      VARIANT_ID=coreos
      PLATFORM_ID="platform:el9"
      PRETTY_NAME="Red Hat Enterprise Linux CoreOS 416.94.202412112347-0"
      ANSI_COLOR="0;31"
      CPE_NAME="cpe:/o:redhat:enterprise_linux:9::baseos::coreos"
      HOME_URL="https://www.redhat.com/"
      DOCUMENTATION_URL="https://docs.okd.io/latest/welcome/index.html"
      BUG_REPORT_URL="https://access.redhat.com/labs/rhir/"
      REDHAT_BUGZILLA_PRODUCT="OpenShift Container Platform"
      REDHAT_BUGZILLA_PRODUCT_VERSION="4.16"
      REDHAT_SUPPORT_PRODUCT="OpenShift Container Platform"
      REDHAT_SUPPORT_PRODUCT_VERSION="4.16"
      OPENSHIFT_VERSION="4.16"
      RHEL_VERSION=9.4
      OSTREE_VERSION="416.94.202412112347-0"
      sh-5.1# rpm -q libreswan
      libreswan-4.6-3.el9_0.3.x86_64
      

      How reproducible is this bug?:

      Always

      Steps to reproduce

      1. Bringup two nodes, configure ipsec connections as mentioned in the log.
      2. Ensure IPsec connectivity works between nodes
      3. Reboot both nodes at the same time.

      Expected results

      The IPsec connections must be established as soon as pluto service started running on both peers.

      Actual results

      It never happens until ipsec auto start <conn-name> done from ovs-monitor-ipsec script: https://github.com/openvswitch/ovs/blob/main/ipsec/ovs-monitor-ipsec.in

       

      Logs:

      Attached.

              dueno@redhat.com Daiki Ueno
              pepalani@redhat.com Periyasamy Palanisamy
              Daiki Ueno Daiki Ueno
              Ondrej Moris Ondrej Moris
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated: