Uploaded image for project: 'RHEL'
  1. RHEL
  2. RHEL-5293

IPsec process periodically crashing with an interval of ~8 hours

    • None
    • None
    • 1
    • rhel-sst-security-crypto
    • ssg_security
    • 0.2
    • False
    • Hide

      None

      Show
      None
    • None
    • Crypto24Q1
    • None
    • None
    • Known Issue
    • Hide
      Cause (the user action or circumstances that trigger the bug): Add connection with the {{ipsec whack}} command, while the remote endpoint doesn't respond in certain amount of time
      Consequence (what the user experience is when the bug occurs): Will hit an assertion failure
      Workaround (if available): Specify {{--dpdaction=hold}} option, or use {{ipsec addconn}} instead of {{ipsec whack}}
      Result (mandatory if the workaround does not solve the problem completely):

      Note:
      This is caused by the different defaults of dpdaction option used by ipsec whack and ipsec addconn, which had been synchronized in upstream, though later the option itself has been removed (i.e., dpdaction=hold is always effective).
      Show
      Cause (the user action or circumstances that trigger the bug): Add connection with the {{ipsec whack}} command, while the remote endpoint doesn't respond in certain amount of time Consequence (what the user experience is when the bug occurs): Will hit an assertion failure Workaround (if available): Specify {{--dpdaction=hold}} option, or use {{ipsec addconn}} instead of {{ipsec whack}} Result (mandatory if the workaround does not solve the problem completely): Note: This is caused by the different defaults of dpdaction option used by ipsec whack and ipsec addconn, which had been synchronized in upstream, though later the option itself has been removed (i.e., dpdaction=hold is always effective).
    • Proposed
    • None

      Description of problem:

      Submariner provides inter-cluster connectivity between OpenShift clusters and it uses Libreswan/IPsec for setting up secure tunnels between the Gateway nodes.
      One of the nodes in each of the OCP clusters will be designated as a Gateway node and submariner-gateway pod (which runs on that node with hostNetworking enabled) configures the necessary IPsec connections on the underlying node.

      We have an OCP customer who reported the following issue.
      After installing Submariner and joining the clusters, the connections are successfully established. However, the submariner-gateway pod is being restarted periodically for every ~8 hours and this is causing datapath disruption to their production applications. When we had a look at the pod logs just prior to the restart we noticed that pluto exited with core dump.

      Logs from the submariner-gateway pod:

      002 listening for IKE messages
      002 forgetting secrets
      002 loading secrets from "/etc/ipsec.secrets"
      002 loading secrets from "/etc/ipsec.d/submariner.secrets"
      [90m2023-05-31T01:44:29.430Z[0m [32mINF[0m ..reswan/libreswan.go:344 libreswan Creating connection(s) for {"metadata":{"name":"m2tstocs-submariner-cable-m2tstocs-10-56-104-242","namespace":"submariner-operator","uid":"445e9250-5dc5-4c28-8a44-9c04addc3730","resourceVersion":"847415991","generation":1,"creationTimestamp":"2023-05-25T02:56:24Z","labels":

      {"submariner-io/clusterID":"m2tstocs"}

      ,"managedFields":[{"manager":"submariner-gateway","operation":"Update","apiVersion":"submariner.io/v1","time":"2023-05-25T02:56:24Z","fieldsType":"FieldsV1","fieldsV1":{"f:metadata":{"f:labels":{".":{},"f:submariner-io/clusterID":{}}},"f:spec":{".":{},"f:backend":{},"f:backend_config":{".":{},"f:natt-discovery-port":{},"f:preferred-server":{},"f:public-ip":{},"f:udp-port":{}},"f:cable_name":{},"f:cluster_id":{},"f:healthCheckIP":{},"f:hostname":{},"f:nat_enabled":{},"f:private_ip":{},"f:public_ip":{},"f:subnets":{}}}}]},"spec":{"cluster_id":"m2tstocs","cable_name":"submariner-cable-m2tstocs-10-56-104-242","healthCheckIP":"10.204.16.1","hostname":"m2tstocs-v7xmz-worker-4cpu-sz62w","subnets":["10.205.0.0/16","10.204.0.0/16"],"private_ip":"10.56.104.242","public_ip":"1.2.3.4","nat_enabled":false,"backend":"libreswan","backend_config":

      {"natt-discovery-port":"4490","preferred-server":"false","public-ip":"ipv4:1.2.3.4","udp-port":"4500"}

      }} in bi-directional mode
      [90m2023-05-31T01:44:29.430Z[0m [32mINF[0m ..reswan/libreswan.go:406 libreswan Executing whack with args: [--psk --encrypt --name submariner-cable-m2tstocs-10-56-104-242-0-0 --id 10.56.103.213 --host 10.56.103.213 --client 10.201.0.0/16 --ikeport 4500 --to --id 10.56.104.242 --host 10.56.104.242 --client 10.205.0.0/16 --ikeport 4500]
      002 "submariner-cable-m2tstocs-10-56-104-242-0-0": added IKEv2 connection
      181 "submariner-cable-m2tstocs-10-56-104-242-0-0" #1: initiating IKEv2 connection
      [90m2023-05-31T01:44:29.473Z[0m [32mINF[0m ..reswan/libreswan.go:406 libreswan Executing whack with args: [--psk --encrypt --name submariner-cable-m2tstocs-10-56-104-242-0-1 --id 10.56.103.213 --host 10.56.103.213 --client 10.201.0.0/16 --ikeport 4500 --to --id 10.56.104.242 --host 10.56.104.242 --client 10.204.0.0/16 --ikeport 4500]
      002 "submariner-cable-m2tstocs-10-56-104-242-0-1": added IKEv2 connection
      [90m2023-05-31T01:44:29.518Z[0m [32mINF[0m ..reswan/libreswan.go:406 libreswan Executing whack with args: [--psk --encrypt --name submariner-cable-m2tstocs-10-56-104-242-1-0 --id 10.56.103.213 --host 10.56.103.213 --client 10.200.0.0/16 --ikeport 4500 --to --id 10.56.104.242 --host 10.56.104.242 --client 10.205.0.0/16 --ikeport 4500]
      002 "submariner-cable-m2tstocs-10-56-104-242-1-0": added IKEv2 connection
      002 "submariner-cable-m2tstocs-10-56-104-242-1-0" #4: initiating Child SA using IKE SA #1
      188 "submariner-cable-m2tstocs-10-56-104-242-1-0" #4: sent CREATE_CHILD_SA request for new IPsec SA
      004 "submariner-cable-m2tstocs-10-56-104-242-1-0" #4: established Child SA; IPsec tunnel [10.200.0.0-10.200.255.255:0-65535 0] -> [10.205.0.0-10.205.255.255:0-65535 0]

      {ESP=>0x021876d8 <0xfbcd41f4 xfrm=AES_GCM_16_256-NONE NATOA=none NATD=none DPD=passive}

      [90m2023-05-31T01:44:29.558Z[0m [32mINF[0m ..reswan/libreswan.go:406 libreswan Executing whack with args: [--psk --encrypt --name submariner-cable-m2tstocs-10-56-104-242-1-1 --id 10.56.103.213 --host 10.56.103.213 --client 10.200.0.0/16 --ikeport 4500 --to --id 10.56.104.242 --host 10.56.104.242 --client 10.204.0.0/16 --ikeport 4500]
      002 "submariner-cable-m2tstocs-10-56-104-242-1-1": added IKEv2 connection
      002 "submariner-cable-m2tstocs-10-56-104-242-1-1" #5: initiating Child SA using IKE SA #1
      188 "submariner-cable-m2tstocs-10-56-104-242-1-1" #5: sent CREATE_CHILD_SA request for new IPsec SA
      004 "submariner-cable-m2tstocs-10-56-104-242-1-1" #5: established Child SA; IPsec tunnel [10.200.0.0-10.200.255.255:0-65535 0] -> [10.204.0.0-10.204.255.255:0-65535 0]

      {ESP=>0x807a203a <0xc401752c xfrm=AES_GCM_16_256-NONE NATOA=none NATD=none DPD=passive}

      [90m2023-05-31T01:44:29.578Z[0m [32mINF[0m ..gine/cableengine.go:202 CableEngine Successfully installed Endpoint cable "submariner-cable-m2tstocs-10-56-104-242" with remote IP 10.56.104.242
      ...
      <SNIP>
      ...
      [90m2023-05-31T09:28:21.865Z[0m [1m[31mFTL[0m[0m ..al/pkg/log/logger.go:67 libreswan Pluto exited: signal: aborted (core dumped)

      Version-Release number of selected component (if applicable):
      OCP 4.12.x
      RHEL 8.6
      OS details: Red Hat Enterprise Linux CoreOS 412.86.202304131008-0 (Ootpa) 4.18.0-372.51.1.el8_6.x86_64 amd64
      Libreswan version: pluto_version=4.5, pluto_vendorid=OE-Libreswan-4.5

      How reproducible:

      Steps to Reproduce:
      1.
      2.
      3.

      Actual results:

      Expected results:

      Additional info:

              dueno@redhat.com Daiki Ueno
              sgaddam@redhat.com Gaddam Sridhar
              Daiki Ueno Daiki Ueno
              SSG Security QE SSG Security QE
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

                Created:
                Updated:
                Resolved: