Scalable NUMA-aware Blocking Synchronization Primitives

This proposal has been rejected.


One Line Summary

A scalable mutex and rwsem design for large multicore machines.


Application scalability is a critical aspect to efficiently use NUMA machines with many cores. To achieve that, various techniques ranging from task placement to data sharding are used in practice. However, from an operating system’s perspective, these techniques often do not work as expected because various subsystems in the OS interact
and share data structures among themselves, resulting in scalability bottlenecks. Although current OSes attempt to tackle this problem
by introducing a wide range of synchronization primitives such as spinlock and mutex, the widely-used synchronization mechanisms are not designed to handle both under- and over-subscribed scenarios in a scalable manner. In particular, the current blocking synchronization primitives that are designed to address both scenarios are NUMA oblivious, meaning that they suffer from cache line contention in an under-subscribed situation, and even worse, inherently spur long scheduler intervention, which leads to sub-optimal performance in an over-subscribed situation.

In this work, we present several design choices to implement scalable blocking synchronization primitives that can address both under-
and over-subscribed scenarios. Such design decisions include memory-efficient NUMA-aware locks (favorable for deployment) and scheduling-aware, scalable parking and wake-up strategies. To validate our design choices, we implement two new blocking synchronization primitives, which are variants of mutex and reader-writer semaphore
in the Linux kernel. Our evaluation results show that the new locks can improve the application performance by 1.2—1.6X, and some of the file system operations by as much as 4.7X, in both under- and over-subscribed scenarios. These new locks use 1.5—10X less memory than state-of-the-art NUMA-aware locks on 120-core machine.


scheduler, mutex, rwsem


  • Sanidhya Kashyap

    Georgia Institute of Technology


    Sanidhya Kashyap is a 3rd year PhD student at Georgia Institute of Technology who spends time trying to look at the scalability aspect of the Linux kernel and other concurrent applications.