Commits
Wu Fengguang committed 6c14ae1e92c
writeback: dirty position control bdi_position_ratio() provides a scale factor to bdi->dirty_ratelimit, so that the resulted task rate limit can drive the dirty pages back to the global/bdi setpoints. Old scheme is, | free run area | throttle area ----------------------------------------+----------------------------> thresh^ dirty pages New scheme is, ^ task rate limit | | * | * | * |[free run] * [smooth throttled] | * | * | * ..bdi->dirty_ratelimit..........* | . * | . * | . * | . * | . * +-------------------------------.-----------------------*------------> setpoint^ limit^ dirty pages The slope of the bdi control line should be 1) large enough to pull the dirty pages to setpoint reasonably fast 2) small enough to avoid big fluctuations in the resulted pos_ratio and hence task ratelimit Since the fluctuation range of the bdi dirty pages is typically observed to be within 1-second worth of data, the bdi control line's slope is selected to be a linear function of bdi write bandwidth, so that it can adapt to slow/fast storage devices well. Assume the bdi control line pos_ratio = 1.0 + k * (dirty - bdi_setpoint) where k is the negative slope. If targeting for 12.5% fluctuation range in pos_ratio when dirty pages are fluctuating in range [bdi_setpoint - write_bw/2, bdi_setpoint + write_bw/2], we get slope k = - 1 / (8 * write_bw) Let pos_ratio(x_intercept) = 0, we get the parameter used in code: x_intercept = bdi_setpoint + 8 * write_bw The global/bdi slopes are nicely complementing each other when the system has only one major bdi (indicated by bdi_thresh ~= thresh): 1) slope of global control line => scaling to the control scope size 2) slope of main bdi control line => scaling to the writeout bandwidth so that - in memory tight systems, (1) becomes strong enough to squeeze dirty pages inside the control scope - in large memory systems where the "gravity" of (1) for pulling the dirty pages to setpoint is too weak, (2) can back (1) up and drive dirty pages to bdi_setpoint ~= setpoint reasonably fast. Unfortunately in JBOD setups, the fluctuation range of bdi threshold is related to memory size due to the interferences between disks. In this case, the bdi slope will be weighted sum of write_bw and bdi_thresh. Given equations span = x_intercept - bdi_setpoint k = df/dx = - 1 / span and the extremum values span = bdi_thresh dx = bdi_thresh we get df = - dx / span = - 1.0 That means, when bdi_dirty deviates bdi_thresh up, pos_ratio and hence task ratelimit will fluctuate by -100%. peter: use 3rd order polynomial for the global control line CC: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Jan Kara <jack@suse.cz> Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>