Source
x
Started by Paul Jackson <pj@sgi.com>
====================
The robust futex ABI
--------------------
====================
:Author: Started by Paul Jackson <pj@sgi.com>
Robust_futexes provide a mechanism that is used in addition to normal
futexes, for kernel assist of cleanup of held locks on task exit.
The interesting data as to what futexes a thread is holding is kept on a
linked list in user space, where it can be updated efficiently as locks
are taken and dropped, without kernel intervention. The only additional
kernel intervention required for robust_futexes above and beyond what is
required for futexes is:
particular futex.
For robust_futexes to work, the user code (typically in a library such
as glibc linked with the application) has to manage and place the
necessary list elements exactly as the kernel expects them. If it fails
to do so, then improperly listed locks will not be cleaned up on exit,
probably causing deadlock or other such failure of the other threads
waiting on the same locks.
A thread that anticipates possibly using robust_futexes should first
issue the system call:
issue the system call::
asmlinkage long
sys_set_robust_list(struct robust_list_head __user *head, size_t len);
The pointer 'head' points to a structure in the threads address space
consisting of three words. Each word is 32 bits on 32 bit arch's, or 64
bits on 64 bit arch's, and local byte order. Each thread should have
its own thread private 'head'.
If a thread is running in 32 bit compatibility mode on a 64 native arch
robust_futex support for exit cleanup of that lock, it should have one
'lock entry' on this list, with its associated 'lock word' at the
specified 'offset'. Should a thread die while holding any such locks,
the kernel will walk this list, mark any such locks with a bit
indicating their holder died, and wakeup the next thread waiting for
that lock using the futex mechanism.
When a thread has invoked the above system call to indicate it
anticipates using robust_futexes, the kernel stores the passed in 'head'
pointer for that task. The task may retrieve that value later on by
using the system call:
using the system call::
asmlinkage long
sys_get_robust_list(int pid, struct robust_list_head __user **head_ptr,
size_t __user *len_ptr);
It is anticipated that threads will use robust_futexes embedded in
larger, user level locking structures, one per lock. The kernel
robust_futex mechanism doesn't care what else is in that structure, so
long as the 'offset' to the 'lock word' is the same for all
robust_futexes used by that thread. The thread should link those locks
thread currently holding such a lock, if any, is marked with the threads
TID in the lower 29 bits of the 'lock word'.
When adding or removing a lock from its list of held locks, in order for
the kernel to correctly handle lock cleanup regardless of when the task
exits (perhaps it gets an unexpected signal 9 in the middle of
manipulating this list), the user code must observe the following
protocol on 'lock entry' insertion and removal:
On insertion:
1) set the 'list_op_pending' word to the address of the 'lock entry'
to be inserted,
2) acquire the futex lock,
3) add the lock entry, with its thread id (TID) in the bottom 29 bits
of the 'lock word', to the linked list starting at 'head', and
4) clear the 'list_op_pending' word.
On removal:
1) set the 'list_op_pending' word to the address of the 'lock entry'
to be removed,
2) remove the lock entry for this lock from the 'head' list,
3) release the futex lock, and
4) clear the 'lock_op_pending' word.
On exit, the kernel will consider the address stored in
'list_op_pending' and the address of each 'lock word' found by walking
the list starting at 'head'. For each such address, if the bottom 29
bits of the 'lock word' at offset 'offset' from that address equals the