-
Bug
-
Resolution: Unresolved
-
Major
-
None
-
rhel-8.6.0
-
None
-
Low
-
rhel-sst-security-crypto
-
ssg_security
-
None
-
False
-
-
None
-
None
-
None
-
None
-
If docs needed, set a value
-
-
Unspecified
-
None
Description of problem:
Submariner uses Libreswan/IPsec for setting up secure tunnels between the Gateway nodes.
One of the nodes in each of the OpenShift clusters will be designated as a Gateway node and submariner-gateway pod (runs on that node with hostNetworking enabled) configures the necessary IPsec connections on the underlying node.
Submariner GW pod runs Pluto [1] and then uses "whack" (check [2] for an example) to set up tunnels to the remote endpoints.
When GW pod is terminated we do not want any datapath to be disrupted, so we do not cleanup any xfrm policies/states from the kernel.
The terminated GW pod will be restarted by Kubernetes according to k8s restartpolicy [3], which means the GW restart time may reach 5 minutes in some cases.
Since the Pod was not aware of the previous connection status, it re-runs Pluto and also the same "whack" commands to establish the connection to the remote endpoint.
So, assuming the following use case:
T0- GW pod starts running, configure the IPSec tunnels using Pluto + whack, the tunnels are up, and everything seems fine
T1- GW is terminated for some reason
T2- GW pod restarted by Kubernetes, GW pod will reconfigure IPSec tunnels using Pluto + whack tunnels again.
Traffic between clusters is down for the T2-T1 period
PFA the GW pod logs on cluster2 (with IPsec debug log enabled) at the time the GW pod on cluster1 was restarted.
[1]
set -e
- These are the ExecStartPre lines from the systemd service definition
/usr/libexec/ipsec/addconn --config /etc/ipsec.conf --checkconfig
/usr/libexec/ipsec/_stackmanager start
/usr/sbin/ipsec --checknss
- Start the daemon itself with any additional arguments passed in
exec /usr/libexec/ipsec/pluto --leak-detective --config /etc/ipsec.conf --nofork
[2]
[90m2023-06-12T12:56:34.663Z [0m [32mINF [0m ..reswan/libreswan.go:419 libreswan Executing whack with args: [--psk --encrypt --name submariner-cable-local-cluster-10-56-103-213-0-0 --id 10.56.104.242 --host 10.56.104.242 --client 10.205.0.0/16 --ikeport 4500 --to --id 10.56.103.213 --host 10.56.103.213 --client 10.201.0.0/16 --ikeport 4500 --dpdaction=hold]
002 "submariner-cable-local-cluster-10-56-103-213-0-0": added IKEv2 connection
[3]
https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#restart-policy
Version-Release number of selected component (if applicable):
Expected results:
Additional info:
- external trackers