Merge branch 'cls-hw-offload-rtnl'
Vlad Buslov says:
====================
Refactor cls hardware offload API to support rtnl-independent drivers
Currently, all cls API hardware offloads driver callbacks require caller
to hold rtnl lock when calling them. This patch set introduces new API
that allows drivers to register callbacks that are not dependent on rtnl
lock and unlocked classifiers to offload filters without obtaining rtnl
lock first, which is intended to allow offloading tc rules in parallel.
Recently, new rtnl registration flag RTNL_FLAG_DOIT_UNLOCKED was added.
TC rule update handlers (RTM_NEWTFILTER, RTM_DELTFILTER, etc.) are
already registered with this flag and only take rtnl lock when qdisc or
classifier requires it. Classifiers can indicate that their ops
callbacks don't require caller to hold rtnl lock by setting the
TCF_PROTO_OPS_DOIT_UNLOCKED flag. Unlocked implementation of flower
classifier is now upstreamed. However, this implementation still obtains
rtnl lock before calling hardware offloads API.
Implement following cls API changes:
- Introduce new "unlocked_driver_cb" flag to struct flow_block_offload
to allow registering and unregistering block hardware offload
callbacks that do not require caller to hold rtnl lock. Drivers that
doesn't require users of its tc offload callbacks to hold rtnl lock
sets the flag to true on block bind/unbind. Internally tcf_block is
extended with additional lockeddevcnt counter that is used to count
number of devices that require rtnl lock that block is bound to. When
this counter is zero, tc_setup_cb_*() functions execute callbacks
without obtaining rtnl lock.
- Extend cls API single hardware rule update tc_setup_cb_call() function
with tc_setup_cb_add(), tc_setup_cb_replace(), tc_setup_cb_destroy()
and tc_setup_cb_reoffload() functions. These new APIs are needed to
move management of block offload counter, filter in hardware counter
and flag from classifier implementations to cls API, which is now
responsible for managing them in concurrency-safe manner. Access to
cb_list from callback execution code is synchronized by obtaining new
'cb_lock' rw_semaphore in read mode, which allows executing callbacks
in parallel, but excludes any modifications of data from
register/unregister code. tcf_block offloads counter type is changed
to atomic integer to allow updating the counter concurrently.
- Extend classifier ops with new ops->hw_add() and ops->hw_del()
callbacks which are used to notify unlocked classifiers when filter is
successfully added or deleted to hardware without releasing cb_lock.
This is necessary to update classifier state atomically with callback
list traversal and updating of all relevant counters and allows
unlocked classifiers to synchronize with concurrent reoffload without
requiring any changes to driver callback API implementations.
New tc flow_action infrastructure is also modified to allow its user to
execute without rtnl lock protection. Function tc_setup_flow_action() is
modified to conditionally obtain rtnl lock before accessing action
state. Action data that is accessed by reference is either copied or
reference counted to prevent concurrent action overwrite from
deallocating it. New function tc_cleanup_flow_action() is introduced to
cleanup/release all such data obtained by tc_setup_flow_action().
Flower classifier (only unlocked classifier at the moment) is modified
to use new cls hardware offloads API and no longer obtains rtnl lock
before calling it.
====================
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>