File management in the Linux kernel
-----------------------------------
This document describes how locking for files (struct file)
and file descriptor table (struct files) works.
Up until 2.6.12, the file descriptor table has been protected
with a lock (files->file_lock) and reference count (files->count).
->file_lock protected accesses to all the file related fields
of the table. ->count was used for sharing the file descriptor
table between tasks cloned with CLONE_FILES flag. Typically
this would be the case for posix threads. As with the common
refcounting model in the kernel, the last task doing
a put_files_struct() frees the file descriptor (fd) table.
The files (struct file) themselves are protected using
reference count (->f_count).
In the new lock-free model of file descriptor management,
the reference counting is similar, but the locking is
based on RCU. The file descriptor table contains multiple
elements - the fd sets (open_fds and close_on_exec, the
array of file pointers, the sizes of the sets and the array
etc.). In order for the updates to appear atomic to
a lock-free reader, all the elements of the file descriptor
table are in a separate structure - struct fdtable.
files_struct contains a pointer to struct fdtable through
which the actual fd table is accessed. Initially the
fdtable is embedded in files_struct itself. On a subsequent
expansion of fdtable, a new fdtable structure is allocated
and files->fdtab points to the new structure. The fdtable
structure is freed with RCU and lock-free readers either
see the old fdtable or the new fdtable making the update
appear atomic. Here are the locking rules for
1. All references to the fdtable must be done through
the files_fdtable() macro :
fdt = files_fdtable(files);
files_fdtable() uses rcu_dereference() macro which takes care of
the memory barrier requirements for lock-free dereference.
The fdtable pointer must be read within the read-side
2. Reading of the fdtable as described above must be protected
by rcu_read_lock()/rcu_read_unlock().
3. For any update to the fd table, files->file_lock must
4. To look up the file structure given an fd, a reader
must use either fcheck() or fcheck_files() APIs. These
take care of barrier requirements due to lock-free lookup.