Source
x
/*
* Block rq-qos base io controller
*
* This works similar to wbt with a few exceptions
*
* - It's bio based, so the latency covers the whole block layer in addition to
* the actual io.
* - We will throttle all IO that comes in here if we need to.
* - We use the mean latency over the 100ms window. This is because writes can
* be particularly fast, which could give us a false sense of the impact of
* other workloads on our protected workload.
* - By default there's no throttling, we set the queue_depth to UINT_MAX so
* that we can have as many outstanding bio's as we're allowed to. Only at
* throttle time do we pay attention to the actual queue depth.
*
* The hierarchy works like the cpu controller does, we track the latency at
* every configured node, and each configured node has it's own independent
* queue depth. This means that we only care about our latency targets at the
* peer level. Some group at the bottom of the hierarchy isn't going to affect
* a group at the end of some other path if we're only configred at leaf level.
*
* Consider the following
*
* root blkg
* / \
* fast (target=5ms) slow (target=10ms)
* / \ / \
* a b normal(15ms) unloved
*
* "a" and "b" have no target, but their combined io under "fast" cannot exceed
* an average latency of 5ms. If it does then we will throttle the "slow"
* group. In the case of "normal", if it exceeds its 15ms target, we will
* throttle "unloved", but nobody else.
*
* In this example "fast", "slow", and "normal" will be the only groups actually
* accounting their io latencies. We have to walk up the heirarchy to the root
* on every submit and complete so we can do the appropriate stat recording and
* adjust the queue depth of ourselves if needed.
*
* There are 2 ways we throttle IO.
*
* 1) Queue depth throttling. As we throttle down we will adjust the maximum
* number of IO's we're allowed to have in flight. This starts at (u64)-1 down
* to 1. If the group is only ever submitting IO for itself then this is the
* only way we throttle.
*
* 2) Induced delay throttling. This is for the case that a group is generating
* IO that has to be issued by the root cg to avoid priority inversion. So think
* REQ_META or REQ_SWAP. If we are already at qd == 1 and we're getting a lot
* of work done for us on behalf of the root cg and are being asked to scale
* down more then we induce a latency at userspace return. We accumulate the
* total amount of time we need to be punished by doing
*
* total_time += min_lat_nsec - actual_io_completion
*
* and then at throttle time will do
*
* throttle_time = min(total_time, NSEC_PER_SEC)
*
* This induced delay will throttle back the activity that is generating the
* root cg issued io's, wethere that's some metadata intensive operation or the
* group is using so much memory that it is pushing us into swap.
*
* Copyright (C) 2018 Josef Bacik
*/
static struct blkcg_policy blkcg_policy_iolatency;
struct iolatency_grp;
struct blk_iolatency {
struct rq_qos rqos;
struct timer_list timer;
atomic_t enabled;
};
static inline struct blk_iolatency *BLKIOLATENCY(struct rq_qos *rqos)
{