Merge branch 'bpf-veth-xdp-support'
Toshiaki Makita says:
====================
This patch set introduces driver XDP for veth.
Basically this is used in conjunction with redirect action of another XDP
program.
NIC -----------> veth===veth
(XDP) (redirect) (XDP)
In this case xdp_frame can be forwarded to the peer veth without
modification, so we can expect far better performance than generic XDP.
Envisioned use-cases
--------------------
* Container managed XDP program
Container host redirects frames to containers by XDP redirect action, and
privileged containers can deploy their own XDP programs.
* XDP program cascading
Two or more XDP programs can be called for each packet by redirecting
xdp frames to veth.
* Internal interface for an XDP bridge
When using XDP redirection to create a virtual bridge, veth can be used
to create an internal interface for the bridge.
Implementation
--------------
This changeset is making use of NAPI to implement ndo_xdp_xmit and
XDP_TX/REDIRECT. This is mainly because XDP heavily relies on NAPI
context.
- patch 1: Export a function needed for veth XDP.
- patch 2-3: Basic implementation of veth XDP.
- patch 4-6: Add ndo_xdp_xmit.
- patch 7-9: Add XDP_TX and XDP_REDIRECT.
- patch 10: Performance optimization for multi-queue env.
Tests and performance numbers
-----------------------------
Tested with a simple XDP program which only redirects packets between
NIC and veth. I used i40e 25G NIC (XXV710) for the physical NIC. The
server has 20 of Xeon Silver 2.20 GHz cores.
pktgen --(wire)--> XXV710 (i40e) <--(XDP redirect)--> veth===veth (XDP)
The rightmost veth loads XDP progs and just does DROP or TX. The number
of packets is measured in the XDP progs. The leftmost pktgen sends
packets at 37.1 Mpps (almost 25G wire speed).
veth XDP action Flows Mpps
================================
DROP 1 10.6
DROP 2 21.2
DROP 100 36.0
TX 1 5.0
TX 2 10.0
TX 100 31.0
I also measured netperf TCP_STREAM but was not so great performance due
to lack of tx/rx checksum offload and TSO, etc.
netperf <--(wire)--> XXV710 (i40e) <--(XDP redirect)--> veth===veth (XDP PASS)
Direction Flows Gbps
==============================
external->veth 1 20.8
external->veth 2 23.5
external->veth 100 23.6
veth->external 1 9.0
veth->external 2 17.8
veth->external 100 22.9
Also tested doing ifup/down or load/unload a XDP program repeatedly
during processing XDP packets in order to check if enabling/disabling
NAPI is working as expected, and found no problems.
v8:
- Don't use xdp_frame pointer address to calculate skb->head, headroom,
and xdp_buff.data_hard_start.
v7:
- Introduce xdp_scrub_frame() to clear kernel pointers in xdp_frame and
use it instead of memset().
v6:
- Check skb->len only if reallocation is needed.
- Add __GFP_NOWARN to alloc_page() since it can be triggered by external
events.
- Fix sparse warning around EXPORT_SYMBOL.
v5:
- Fix broken SOBs.
v4:
- Don't adjust MTU automatically.
- Skip peer IFF_UP check on .ndo_xdp_xmit() because it is unnecessary.
Add comments to explain that.
- Use redirect_info instead of xdp_mem_info for storing no_direct flag
to avoid per packet copy cost.
v3:
- Drop skb bulk xmit patch since it makes little performance
difference. The hotspot in TCP skb xmit at this point is checksum
computation in skb_segment and packet copy on XDP_REDIRECT due to
cloned/nonlinear skb.
- Fix race on closing device.
- Add extack messages in ndo_bpf.
v2:
- Squash NAPI patch with "Add driver XDP" patch.
- Remove conversion from xdp_frame to skb when NAPI is not enabled.
- Introduce per-queue XDP ring (patch 8).
- Introduce bulk skb xmit when XDP is enabled on the peer (patch 9).
====================
Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>