From nobody Fri Mar 21 15:42:32 2025 X-From-Line: nobody Mon Feb 10 12:56:35 2025 From: Aaron Conole To: =?utf-8?Q?Adri=C3=A1n?= Moreno Cc: Ilya Maximets , Eelco Chaudron , Mike Pattrick , Flavio Leitner , Florian Westphal Subject: Re: (internal) [RFC] openvswitch: Add sockmap primitives In-Reply-To: (Aaron Conole's message of "Wed, 29 Jan 2025 17:16:01 -0500") References: X-Draft-From: ("nnfolder+archive:sent.2025-01" 71) Date: Mon, 10 Feb 2025 12:56:31 -0500 Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) MIME-Version: 1.0 Content-Type: text/plain Lines: 730 Xref: RHTRH0061144 sent.2025-02:28 X-Gnus-Article-Number: 28 Mon, 10 Feb 2025 12:56:35 -0500 -ENOTCOMPLETE Latest version here. As usual, I feel an update every week might be spammy, so again if you want to be trimmed from the distribution list just let me know. I've pulled in the changes suggested by Mike - still need to pull in Adrian's suggestions for the action lists (but I think it's probably a good way to move forward). Code is still a bit messy at the moment (including a big ol' honkin' #if 0 block) because I just did a quick bit of testing. Results (guess which set used the actions from the patch below): [core@localhost ~]$ sudo ip netns exec left ./git/iperf3/src/iperf3 -s ----------------------------------------------------------- Server listening on 5201 (test #1) ----------------------------------------------------------- Accepted connection from 172.31.110.2, port 50794 [ 5] local 172.31.110.1 port 5201 connected to 172.31.110.2 port 50806 [ ID] Interval Transfer Bitrate [ 5] 0.00-1.00 sec 9.57 GBytes 82.1 Gbits/sec [ 5] 1.00-2.00 sec 9.49 GBytes 81.6 Gbits/sec [ 5] 2.00-3.00 sec 9.71 GBytes 83.4 Gbits/sec [ 5] 3.00-4.00 sec 9.75 GBytes 83.8 Gbits/sec [ 5] 4.00-5.00 sec 10.0 GBytes 86.3 Gbits/sec [ 5] 5.00-6.00 sec 9.95 GBytes 85.4 Gbits/sec [ 5] 6.00-7.00 sec 9.97 GBytes 85.7 Gbits/sec [ 5] 7.00-8.00 sec 10.0 GBytes 86.1 Gbits/sec [ 5] 8.00-9.00 sec 9.84 GBytes 84.5 Gbits/sec [ 5] 9.00-10.00 sec 9.95 GBytes 85.5 Gbits/sec [ 5] 10.00-10.00 sec 512 KBytes 11.0 Gbits/sec - - - - - - - - - - - - - - - - - - - - - - - - - [ ID] Interval Transfer Bitrate [ 5] 0.00-10.00 sec 98.3 GBytes 84.4 Gbits/sec receiver ----------------------------------------------------------- Server listening on 5201 (test #2) ----------------------------------------------------------- ^Ciperf3: interrupt - the server has terminated [core@localhost ~]$ sudo ip netns exec left ./git/iperf3/src/iperf3 -s ----------------------------------------------------------- Server listening on 5201 (test #1) ----------------------------------------------------------- Accepted connection from 172.31.110.2, port 55942 [ 5] local 172.31.110.1 port 5201 connected to 172.31.110.2 port 55956 [ ID] Interval Transfer Bitrate [ 5] 0.00-1.00 sec 8.40 GBytes 72.0 Gbits/sec [ 5] 1.00-2.00 sec 8.66 GBytes 74.4 Gbits/sec [ 5] 2.00-3.00 sec 8.52 GBytes 73.2 Gbits/sec [ 5] 3.00-4.00 sec 8.48 GBytes 72.9 Gbits/sec [ 5] 4.00-5.00 sec 8.60 GBytes 73.9 Gbits/sec [ 5] 5.00-6.00 sec 8.46 GBytes 72.7 Gbits/sec [ 5] 6.00-7.00 sec 8.46 GBytes 72.7 Gbits/sec [ 5] 7.00-8.00 sec 8.45 GBytes 72.6 Gbits/sec [ 5] 8.00-9.00 sec 8.39 GBytes 72.1 Gbits/sec [ 5] 9.00-10.00 sec 8.40 GBytes 72.2 Gbits/sec - - - - - - - - - - - - - - - - - - - - - - - - - [ ID] Interval Transfer Bitrate [ 5] 0.00-10.00 sec 84.8 GBytes 72.9 Gbits/sec receiver ----------------------------------------------------------- Server listening on 5201 (test #2) ----------------------------------------------------------- So, given it has a "better" performance than the normal IP stack forwarding case (because we skip over routing and netfilter calls), it should show consistent performance even in the ct() actions case (but that requires quite a bit more work to get right). The section where we don't actually use the input socket pointer needs to be revisited, because that is probably the *best* endpoint to use. If you need the scripts again that I use to setup the topology, the flows, etc. I can forward them on again. --- include/uapi/linux/openvswitch.h | 13 +- net/openvswitch/actions.c | 322 ++++++++++++++++++ net/openvswitch/datapath.c | 11 + net/openvswitch/datapath.h | 50 +++ net/openvswitch/flow_netlink.c | 17 +- net/openvswitch/vport.c | 2 + .../selftests/net/openvswitch/ovs-dpctl.py | 40 +++ 7 files changed, 453 insertions(+), 2 deletions(-) diff --git a/include/uapi/linux/openvswitch.h b/include/uapi/linux/openvswitch.h index 3a701bd1f..96988ab97 100644 --- a/include/uapi/linux/openvswitch.h +++ b/include/uapi/linux/openvswitch.h @@ -1033,7 +1033,18 @@ enum ovs_action_attr { OVS_ACTION_ATTR_DEC_TTL, /* Nested OVS_DEC_TTL_ATTR_*. */ OVS_ACTION_ATTR_DROP, /* u32 error code. */ OVS_ACTION_ATTR_PSAMPLE, /* Nested OVS_PSAMPLE_ATTR_*. */ - + OVS_ACTION_ATTR_SOCK_TRY, /* Attempt to find a socket in the map. + * If an appropriate socket is found, + * then the packet is forwarded and the + * pipeline ends. Otherwise, jump to + * u32 recirc id. */ + OVS_ACTION_ATTR_MD_SOCK_TUPLE, /* Sets the socket map criteria to use + * the key's 5-tuple details. */ + OVS_ACTION_ATTR_ADD_SOCK, /* Looks at the port specified by u32, + * and if possible tries to find a socket. If + * found, take a reference to the socket, and + * populate the map with the last loaded sock + * tuple as a key, and the socket as value */ __OVS_ACTION_ATTR_MAX, /* Nothing past this will be accepted * from userspace. */ diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c index 16e260014..c1dfe36b3 100644 --- a/net/openvswitch/actions.c +++ b/net/openvswitch/actions.c @@ -15,6 +15,7 @@ #include #include #include +#include #include #include @@ -24,6 +25,9 @@ #include #include #include +#include +#include +#include #if IS_ENABLED(CONFIG_PSAMPLE) #include @@ -1356,6 +1360,303 @@ static void execute_psample(struct datapath *dp, struct sk_buff *skb, {} #endif +static int ovs_cmp_sock_md(struct ovs_skb_sk_map_data *key, + struct ovs_skb_sk_map_data *cmp) { + return (key && cmp && key->key_type == cmp->key_type && + (key->key_type != OVS_SK_MAP_KEY_UNSET) && + ((key->key_type == OVS_SK_MAP_KEY_INPUT_SOCKET_BASED && + key->key.input_socket == cmp->key.input_socket) || + (key->key_type == OVS_SK_MAP_KEY_TUPLE_BASED && + key->key.tuple.ip.ipv4.src == cmp->key.tuple.ip.ipv4.src && + key->key.tuple.ip.ipv4.dst == cmp->key.tuple.ip.ipv4.dst && + key->key.tuple.tp.src == cmp->key.tuple.tp.src && + key->key.tuple.tp.dst == cmp->key.tuple.tp.dst && + key->key.tuple.protocol == cmp->key.tuple.protocol))); +} + +static bool ovs_skbuff_validate_for_sockmap(struct sk_buff *skb, unsigned int *hlen) +{ + unsigned int header_len; + struct ethhdr *eth; + struct tcphdr *tcp; + struct iphdr *ip; + + if (!skb || skb->len <= sizeof(struct ethhdr)) + return false; + + eth = eth_hdr(skb); + if (!eth) + return false; + + if (eth->h_proto != htons(ETH_P_IP)) + return false; + + ip = ip_hdr(skb); + if (!ip || skb->len <= sizeof(struct ethhdr) + ip->ihl * 4) + return false; + + if (ip->protocol != IPPROTO_TCP) + return false; + + tcp = tcp_hdr(skb); + if (!tcp || skb->len <= sizeof(struct ethhdr) + ip->ihl * 4 + tcp->doff * 4) + return false; + + header_len = sizeof(struct ethhdr) + ip->ihl * 4 + tcp->doff * 4; + if (hlen) + *hlen = header_len; + + return true; +} + +#if 0 +/* Insert skb into rb tree, ordered by TCP_SKB_CB(skb)->seq */ +static void ovs_tcp_rbtree_insert(struct rb_root *root, struct sk_buff *skb) +{ + struct rb_node **p = &root->rb_node; + struct rb_node *parent = NULL; + struct sk_buff *skb1; + + while (*p) { + parent = *p; + skb1 = rb_to_skb(parent); + if (before(TCP_SKB_CB(skb)->seq, TCP_SKB_CB(skb1)->seq)) + p = &parent->rb_left; + else + p = &parent->rb_right; + } + rb_link_node(&skb->rbnode, parent, p); + rb_insert_color(&skb->rbnode, root); +} +#endif + +static int enqueue_skb_to_tcp_socket(struct sock *sk, struct sk_buff *skb) +{ + size_t skb_doff = skb_transport_offset(skb), oiif; + struct tcphdr *tcph = tcp_hdr(skb); + struct iphdr *iph = ip_hdr(skb); + struct ovs_skb_cb oskb; + int ret = 0; + + memcpy(&oskb, OVS_CB(skb), sizeof oskb); + + /* Setup for processing the TCP details */ + skb_pull(skb, skb_doff); + + IP_INC_STATS(sock_net(sk), IPSTATS_MIB_INPKTS); + TCP_INC_STATS(sock_net(sk), TCP_MIB_INSEGS); + + /* Initialize TCP_SKB_CB(skb)->header.h4 to null */ + memset(&TCP_SKB_CB(skb)->header.h4, 0, + sizeof(TCP_SKB_CB(skb)->header.h4)); + + TCP_SKB_CB(skb)->seq = ntohl(tcph->seq); + TCP_SKB_CB(skb)->end_seq = (TCP_SKB_CB(skb)->seq + tcph->syn + + tcph->fin + skb->len - tcph->doff * 4); + TCP_SKB_CB(skb)->ack_seq = ntohl(tcph->ack_seq); + TCP_SKB_CB(skb)->tcp_flags = tcp_flag_word(tcph); + TCP_SKB_CB(skb)->sacked = 0; /* Clear SACK state */ + TCP_SKB_CB(skb)->ip_dsfield = ipv4_get_dsfield(iph); + TCP_SKB_CB(skb)->has_rxtstamp = skb->tstamp || + skb_hwtstamps(skb)->hwtstamp; + oiif = skb->skb_iif; + bh_lock_sock_nested(sk); + tcp_segs_in(tcp_sk(sk), skb); + skb->skb_iif = sk->sk_rx_dst_ifindex; + TCP_SKB_CB(skb)->header.h4.iif = sk->sk_rx_dst_ifindex; + ret = 0; + if (!sock_owned_by_user(sk)) { + ret = tcp_v4_do_rcv(sk, skb); + } else { + enum skb_drop_reason drop_reason; + if (tcp_add_backlog(sk, skb, &drop_reason)) { + ret = -EAGAIN; + skb->skb_iif = oiif; + skb_push(skb, skb_doff); + memcpy(OVS_CB(skb), &oskb, sizeof oskb); + goto tcp_add_done; + } + } +tcp_add_done: + bh_unlock_sock(sk); + return ret; +} + +static int execute_sock_try(struct datapath *dp, struct sk_buff *skb, + struct sw_flow_key *key, + const struct nlattr *a, bool last) +{ + struct dp_sk_mnode *n; + u32 recirc_id; + + if (unlikely(!OVS_CB(skb)->sk_map_data) || + OVS_CB(skb)->sk_map_data->key_type == OVS_SK_MAP_KEY_UNSET) { + net_warn_ratelimited("Attempt to use ovs sk_map without a valid tuple.\n"); + goto recirc_action; + } + + list_for_each_entry(n, &dp->sock_list, list_node) { + if (ovs_cmp_sock_md(&n->key, OVS_CB(skb)->sk_map_data)) { + struct sock *sk = n->output_sock; + unsigned int pull_len = 0; + int ret; + + if (!sk || (sk->sk_state != TCP_ESTABLISHED) || + !ovs_skbuff_validate_for_sockmap(skb, &pull_len) || + (skb->pkt_type != PACKET_HOST && + skb->pkt_type != PACKET_OTHERHOST)) { + goto recirc_action; + } + + ret = enqueue_skb_to_tcp_socket(sk, skb); + if (ret == -EAGAIN) + goto recirc_action; + + return ret; + } + } + + recirc_action: + recirc_id = nla_get_u32(a); + return clone_execute(dp, skb, key, recirc_id, NULL, 0, last, true); +} + +static struct sock *get_socket(struct net *net, __be32 saddr, __be16 sport, + __be32 daddr, __be16 dport, u32 idx, bool ref) +{ + struct sock *sk = NULL; + struct inet_hashinfo *hashinfo = &tcp_hashinfo; + spinlock_t *lock; + u32 hash; + + hash = inet_ehashfn(net, daddr, dport, saddr, sport); + lock = inet_ehash_lockp(hashinfo, hash); + + spin_lock_bh(lock); + sk = __inet_lookup_established(net, hashinfo, saddr, sport, daddr, + dport, idx, 0); + /* take a reference to the socket while under the lock. */ + if (sk && sk->sk_state == TCP_ESTABLISHED && ref) + sock_hold(sk); + else + sk = NULL; + spin_unlock_bh(lock); + + /* At this point the caller has a valid reference to the socket. */ + return sk && sk->sk_state == TCP_ESTABLISHED ? sk : NULL; +} + +static int execute_ovs_sk_map_metadata(struct sk_buff *skb, + struct sw_flow_key *key) +{ + struct ovs_skb_sk_map_data *skmd = OVS_CB(skb)->sk_map_data; + + /* for now, only ipv4 tcp - more to follow */ + if (key->ip.proto != IPPROTO_TCP || skb->protocol != htons(ETH_P_IP)) + return 0; + + /* Never override the input socket mapping, as it is the + * preferred key. */ + if (skmd->key_type == OVS_SK_MAP_KEY_INPUT_SOCKET_BASED && + skmd->key.input_socket) { + return 0; + } + +#if 0 + /* PoC: Just do ipv4, but this needs to expand for a real solution. */ + if (skmd->key_type == OVS_SK_MAP_KEY_UNSET && + in_port->dev->rtnl_link_ops && + in_port->dev->rtnl_link_ops->get_link_net) { + struct sock *sk; + struct net *ns; + u32 ifindex; + + ns = in_port->dev->rtnl_link_ops->get_link_net(in_port->dev); + ifindex = inet_iif(skb); + + /* We swap src/dst when lookup input side. */ + sk = get_socket(ns, + key->ipv4.addr.dst, key->tp.dst, + key->ipv4.addr.src, htons(key->tp.src), + ifindex, false); + if (sk) { + skmd->key_type = OVS_SK_MAP_KEY_INPUT_SOCKET_BASED; + skmd->key.input_socket = sk; + return 0; + } + } +#endif + + skmd->key_type = OVS_SK_MAP_KEY_TUPLE_BASED; + skmd->key.tuple.ip.ipv4.src = key->ipv4.addr.src; + skmd->key.tuple.ip.ipv4.dst = key->ipv4.addr.dst; + skmd->key.tuple.tp.src = key->tp.src; + skmd->key.tuple.tp.dst = key->tp.dst; + skmd->key.tuple.protocol = key->ip.proto; + return 0; +} + +/* lookup a socket in the output port specified by 'port'. If found, use + * the metadata as a key to insert into the list. */ +static int execute_ovs_add_sock(struct datapath *dp, struct sk_buff *skb, + struct sw_flow_key *key, + u32 port) +{ + struct ovs_skb_sk_map_data *skmd = OVS_CB(skb)->sk_map_data; + struct vport *vport = ovs_vport_rcu(dp, port); + struct sock *sock = NULL; + + if (!skmd) { + return -EINVAL; + } + + if (likely(vport && netif_running(vport->dev) && + netif_carrier_ok(vport->dev)) && + vport->dev->rtnl_link_ops && + vport->dev->rtnl_link_ops->get_link_net) { + struct net *ns; + u32 ifindex; + + ns = vport->dev->rtnl_link_ops->get_link_net(vport->dev); + ifindex = inet_sdif(skb); + sock = get_socket(ns, + key->ipv4.addr.src, key->tp.src, + key->ipv4.addr.dst, htons(key->tp.dst), + ifindex, true); + } + + if (sock) { + struct dp_sk_mnode *node; + struct dp_sk_mnode *n; + + list_for_each_entry(n, &dp->sock_list, list_node) { + if (ovs_cmp_sock_md(&n->key, + OVS_CB(skb)->sk_map_data)) { + /* get_socket above took a ref, so must + * actually close it here. + */ + sock_put(sock); + return 0; + } + } + + node = kzalloc(sizeof(*node), GFP_ATOMIC); + if (!node) { + /* get_socket above took a ref, so must actually close + * it here. + */ + sock_put(sock); + return -ENOMEM; + } + + node->key = *OVS_CB(skb)->sk_map_data; + node->output_sock = sock; + list_add(&node->list_node, &dp->sock_list); + } + + return 0; +} + /* Execute a list of actions against 'skb'. */ static int do_execute_actions(struct datapath *dp, struct sk_buff *skb, struct sw_flow_key *key, @@ -1568,8 +1869,29 @@ static int do_execute_actions(struct datapath *dp, struct sk_buff *skb, return 0; } break; + + case OVS_ACTION_ATTR_SOCK_TRY: { + bool last = nla_is_last(a, rem); + + err = execute_sock_try(dp, skb, key, a, last); + if (last) { + /* If this is the last action, the skb has + * been consumed or freed. + * Return immediately. + */ + return err; + } + break; } + case OVS_ACTION_ATTR_MD_SOCK_TUPLE: + err = execute_ovs_sk_map_metadata(skb, key); + break; + + case OVS_ACTION_ATTR_ADD_SOCK: + err = execute_ovs_add_sock(dp, skb, key, nla_get_u32(a)); + break; + } if (unlikely(err)) { ovs_kfree_skb_reason(skb, OVS_DROP_ACTION_ERROR); return err; diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c index 225f60488..e6555fe44 100644 --- a/net/openvswitch/datapath.c +++ b/net/openvswitch/datapath.c @@ -165,12 +165,19 @@ static int get_dpifindex(const struct datapath *dp) static void destroy_dp_rcu(struct rcu_head *rcu) { struct datapath *dp = container_of(rcu, struct datapath, rcu); + struct dp_sk_mnode *n, *cur; ovs_flow_tbl_destroy(&dp->table); free_percpu(dp->stats_percpu); kfree(dp->ports); ovs_meters_exit(dp); kfree(rcu_dereference_raw(dp->upcall_portids)); + + list_for_each_entry_safe(cur, n, &dp->sock_list, list_node) { + sock_put(cur->output_sock); + kfree(cur); + } + kfree(dp); } @@ -598,6 +605,7 @@ static int ovs_packet_cmd_execute(struct sk_buff *skb, struct genl_info *info) struct sw_flow_actions *sf_acts; struct datapath *dp; struct vport *input_vport; + u16 mru = 0; u64 hash; int len; @@ -624,6 +632,7 @@ static int ovs_packet_cmd_execute(struct sk_buff *skb, struct genl_info *info) packet->ignore_df = 1; } OVS_CB(packet)->mru = mru; + OVS_CB(packet)->sk_map_data = NULL; if (a[OVS_PACKET_ATTR_HASH]) { hash = nla_get_u64(a[OVS_PACKET_ATTR_HASH]); @@ -1862,6 +1871,8 @@ static int ovs_dp_cmd_new(struct sk_buff *skb, struct genl_info *info) ovs_net = net_generic(ovs_dp_get_net(dp), ovs_net_id); list_add_tail_rcu(&dp->list_node, &ovs_net->dps); + INIT_LIST_HEAD(&dp->sock_list); + ovs_unlock(); ovs_notify(&dp_datapath_genl_family, reply, info); diff --git a/net/openvswitch/datapath.h b/net/openvswitch/datapath.h index 365b9bb7f..341f4069a 100644 --- a/net/openvswitch/datapath.h +++ b/net/openvswitch/datapath.h @@ -65,6 +65,50 @@ struct dp_nlsk_pids { u32 pids[]; }; +enum ovs_sk_map_key_select { + OVS_SK_MAP_KEY_UNSET, + OVS_SK_MAP_KEY_INPUT_SOCKET_BASED, + OVS_SK_MAP_KEY_TUPLE_BASED, + + OVS_SK_MAP_KEY_MAX__ +}; + +/** + * struct ovs_skb_sk_map_data - OVS SK Map lookup data + * @key_type: Select whether to use input_socket based map or use the 5-tuple. + * @key: Union of input_socket vs 5-tuple. + */ +struct ovs_skb_sk_map_data { + enum ovs_sk_map_key_select key_type; + union { + struct sock *input_socket; + struct { + union { + struct { + __be32 src; /* IP4 source address. */ + __be32 dst; /* IP4 destination address. */ + } ipv4; + struct { + struct in6_addr src; /* IP6 source address. */ + struct in6_addr dst; /* IP6 destination address. */ + __be32 label; /* IP6 flow label. */ + } ipv6; + } ip; + struct { + __be16 src; /* TCP/UDP/SCTP src port. */ + __be16 dst; /* TCP/UDP/SCTP dst port. */ + } tp; + u8 protocol; /* IPPROTO_*. */ + } tuple; + } key; +}; + +struct dp_sk_mnode { + struct list_head list_node; + struct ovs_skb_sk_map_data key; + struct sock *output_sock; +}; + /** * struct datapath - datapath for flow-based packet switching * @rcu: RCU callback head for deferred destruction. @@ -105,6 +149,9 @@ struct datapath { struct dp_meter_table meter_tbl; struct dp_nlsk_pids __rcu *upcall_portids; + + /* Socket list */ + struct list_head sock_list; }; /** @@ -117,6 +164,8 @@ struct datapath { * @cutlen: The number of bytes from the packet end to be removed. * @probability: The sampling probability that was applied to this skb; 0 means * no sampling has occurred; U32_MAX means 100% probability. + * @sk_map_data: The tuples and other information used to interact with the + * current datapath's skmap; only populated after a metadata load is called. */ struct ovs_skb_cb { struct vport *input_vport; @@ -124,6 +173,7 @@ struct ovs_skb_cb { u16 acts_origlen; u32 cutlen; u32 probability; + struct ovs_skb_sk_map_data *sk_map_data; }; #define OVS_CB(skb) ((struct ovs_skb_cb *)(skb)->cb) diff --git a/net/openvswitch/flow_netlink.c b/net/openvswitch/flow_netlink.c index 881ddd369..ac6ecf572 100644 --- a/net/openvswitch/flow_netlink.c +++ b/net/openvswitch/flow_netlink.c @@ -65,6 +65,9 @@ static bool actions_may_change_flow(const struct nlattr *actions) case OVS_ACTION_ATTR_USERSPACE: case OVS_ACTION_ATTR_DROP: case OVS_ACTION_ATTR_PSAMPLE: + case OVS_ACTION_ATTR_SOCK_TRY: + case OVS_ACTION_ATTR_MD_SOCK_TUPLE: + case OVS_ACTION_ATTR_ADD_SOCK: break; case OVS_ACTION_ATTR_CT: @@ -2410,7 +2413,7 @@ static void ovs_nla_free_nested_actions(const struct nlattr *actions, int len) /* Whenever new actions are added, the need to update this * function should be considered. */ - BUILD_BUG_ON(OVS_ACTION_ATTR_MAX != 25); + BUILD_BUG_ON(OVS_ACTION_ATTR_MAX != 28); if (!actions) return; @@ -3236,6 +3239,9 @@ static int __ovs_nla_copy_actions(struct net *net, const struct nlattr *attr, [OVS_ACTION_ATTR_DEC_TTL] = (u32)-1, [OVS_ACTION_ATTR_DROP] = sizeof(u32), [OVS_ACTION_ATTR_PSAMPLE] = (u32)-1, + [OVS_ACTION_ATTR_SOCK_TRY] = sizeof(u32), + [OVS_ACTION_ATTR_MD_SOCK_TUPLE] = 0, + [OVS_ACTION_ATTR_ADD_SOCK] = sizeof(u32), }; const struct ovs_action_push_vlan *vlan; int type = nla_type(a); @@ -3520,6 +3526,15 @@ static int __ovs_nla_copy_actions(struct net *net, const struct nlattr *attr, return err; break; + case OVS_ACTION_ATTR_SOCK_TRY: fallthrough; + case OVS_ACTION_ATTR_MD_SOCK_TUPLE: + break; + + case OVS_ACTION_ATTR_ADD_SOCK: + if (nla_get_u32(a) >= DP_MAX_PORTS) + return -EINVAL; + break; + default: OVS_NLERR(log, "Unknown Action type %d", type); return -EINVAL; diff --git a/net/openvswitch/vport.c b/net/openvswitch/vport.c index 8732f6e51..9c8add78b 100644 --- a/net/openvswitch/vport.c +++ b/net/openvswitch/vport.c @@ -494,6 +494,7 @@ u32 ovs_vport_find_upcall_portid(const struct vport *vport, int ovs_vport_receive(struct vport *vport, struct sk_buff *skb, const struct ip_tunnel_info *tun_info) { + struct ovs_skb_sk_map_data skmd = {}; struct sw_flow_key key; int error; @@ -501,6 +502,7 @@ int ovs_vport_receive(struct vport *vport, struct sk_buff *skb, OVS_CB(skb)->mru = 0; OVS_CB(skb)->cutlen = 0; OVS_CB(skb)->probability = 0; + OVS_CB(skb)->sk_map_data = &skmd; if (unlikely(dev_net(skb->dev) != ovs_dp_get_net(vport->dp))) { u32 mark; diff --git a/tools/testing/selftests/net/openvswitch/ovs-dpctl.py b/tools/testing/selftests/net/openvswitch/ovs-dpctl.py index 8a0396bfa..591581745 100644 --- a/tools/testing/selftests/net/openvswitch/ovs-dpctl.py +++ b/tools/testing/selftests/net/openvswitch/ovs-dpctl.py @@ -392,6 +392,9 @@ class ovsactions(nla): ("OVS_ACTION_ATTR_DEC_TTL", "none"), ("OVS_ACTION_ATTR_DROP", "uint32"), ("OVS_ACTION_ATTR_PSAMPLE", "psample"), + ("OVS_ACTION_ATTR_SOCK_TRY", "uint32"), + ("OVS_ACTION_ATTR_MD_SOCK_TUPLE", "flag"), + ("OVS_ACTION_ATTR_ADD_SOCK", "uint32"), ) class psample(nla): @@ -639,6 +642,13 @@ class ovsactions(nla): print_str += "pop_nsh" elif field[0] == "OVS_ACTION_ATTR_POP_MPLS": print_str += "pop_mpls" + elif field[0] == "OVS_ACTION_ATTR_MD_SOCK_TUPLE": + print_str += "sock(tuple)" + elif field[0] == "OVS_ACTION_ATTR_SOCK_TRY": + print_str += "sock(try,recirc=%d)" % \ + int(self.get_attr(field[0])) + elif field[0] == "OVS_ACTION_ATTR_ADD_SOCK": + print_str += "sock(commit,%d)" % int(self.get_attr(field[0])) else: datum = self.get_attr(field[0]) if field[0] == "OVS_ACTION_ATTR_CLONE": @@ -878,6 +888,36 @@ class ovsactions(nla): self["attrs"].append(["OVS_ACTION_ATTR_TRUNC", val]) parsed = True + elif parse_starts_block(actstr, "sock(", False): + parencount += 1 + actstr = actstr[5:] + if actstr.startswith("try,"): + actstr = actstr[4:] + actstr, val = parse_extract_field( + actstr, + "recirc=", + r"([0-9a-fA-Fx]+)", + lambda x: int(x, 0), + False) + if val is not None: + self["attrs"].append(["OVS_ACTION_ATTR_SOCK_TRY", val]) + parsed = True + elif actstr.startswith("tuple"): + actstr = actstr[5:] + parsed = True + self["attrs"].append(["OVS_ACTION_ATTR_MD_SOCK_TUPLE", + True]) + elif actstr.startswith("commit,"): + actstr, val = parse_extract_field( + actstr, + "commit,", + r"([0-9]+)", + int, + False) + if val is not None: + self["attrs"].append(["OVS_ACTION_ATTR_ADD_SOCK", val]) + parsed = True + actstr = actstr[strspn(actstr, ", ") :] while parencount > 0: parencount -= 1 -- 2.43.5