Path walking and name lookup locking
====================================
Path resolution is the finding a dentry corresponding to a path name string, by
performing a path walk. Typically, for every open(), stat() etc., the path name
will be resolved. Paths are resolved by walking the namespace tree, starting
with the first component of the pathname (eg. root or cwd) with a known dentry,
then finding the child of that dentry, which is named the next component in the
path string. Then repeating the lookup from the child dentry and finding its
child with the next element, and so on.
Since it is a frequent operation for workloads like multiuser environments and
web servers, it is important to optimize this code.
Path walking synchronisation history:
Prior to 2.5.10, dcache_lock was acquired in d_lookup (dcache hash lookup) and
thus in every component during path look-up. Since 2.5.10 onwards, fast-walk
algorithm changed this by holding the dcache_lock at the beginning and walking
as many cached path component dentries as possible. This significantly
decreases the number of acquisition of dcache_lock. However it also increases
the lock hold time significantly and affects performance in large SMP machines.
Since 2.5.62 kernel, dcache has been using a new locking model that uses RCU to
make dcache look-up lock-free.
All the above algorithms required taking a lock and reference count on the
dentry that was looked up, so that may be used as the basis for walking the
next path element. This is inefficient and unscalable. It is inefficient
because of the locks and atomic operations required for every dentry element
slows things down. It is not scalable because many parallel applications that
are path-walk intensive tend to do path lookups starting from a common dentry
(usually, the root "/" or current working directory). So contention on these
common path elements causes lock and cacheline queueing.
Since 2.6.38, RCU is used to make a significant part of the entire path walk
(including dcache look-up) completely "store-free" (so, no locks, atomics, or
even stores into cachelines of common dentries). This is known as "rcu-walk"
A name string specifies a start (root directory, cwd, fd-relative) and a
sequence of elements (directory entry names), which together refer to a path in
the namespace. A path is represented as a (dentry, vfsmount) tuple. The name
elements are sub-strings, separated by '/'.
Name lookups will want to find a particular path that a name string refers to
(usually the final element, or parent of final element). This is done by taking
the path given by the name's starting point (which we know in advance -- eg.
current->fs->cwd or current->fs->root) as the first parent of the lookup. Then
iteratively for each subsequent name element, look up the child of the current
parent with the given name and if it is not the desired entry, make it the
parent for the next lookup.
A parent, of course, must be a directory, and we must have appropriate
permissions on the parent inode to be able to walk into it.
Turning the child into a parent for the next lookup requires more checks and
procedures. Symlinks essentially substitute the symlink name for the target
name in the name string, and require some recursive path walking. Mount points
must be followed into (thus changing the vfsmount that subsequent path elements
refer to), switching from the mount point path to the root of the particular
mounted vfsmount. These behaviours are variously modified depending on the