MD raid general discussion

60 Minute BoF session
Scheduled: Thursday, September 14, 2017 from 5:30 – 6:30pm in Platinum B

One Line Summary

In the last years, there are many development activities in md raid, we need to sit together to discuss development road map, kernel and user space tool collaboration, and how to work with development of other subsystems. It is also open to other developers to join, all constructive comments are warmly welcome.


In the last years, we have many development activities in md raid, e.g.
- Raid5 cache
This is an effort to close raid5 writing hole (data lost when system crash happens), the original idea was from Ext4 file system journal (this is why its earlier name was raid5-journal). Now raid5 cache can be used to improve performance for writing burst and improve tolerance of writing hole.
- Partial parity log
This is another effort to close raid5 writing hole. In degraded state, there is no way to recalculate parity, because one of the disks is missing. Raid5 partial parity log allows recalculating the parity, it does not protect on-flight data, just make data consistency on disks.
- Raid1 clustering
This is an in kernel distributed data mirroring implementation, mirror devices of the raid1 can be from different servers far from each other in different data center. Users may have distribted clustering data duplication on md raid1.
- Sysfs interface for mdadm
Currently mdadm tool mixes ioctl and sysfs interface together to communicate with kernel code. And kernel code has to serve for both interface with different code paths. Now mdadm developers want to unify the interface to a set of sysfs files, it requires effort from both kernel space and user space. The benefit is, we can have a unified code path in both kernel space and user space, which is more clear when communication happens between mdadm and md kernel code.
- Performance improvement for NVMe devices
Lockless I/O submission on RAID1, DISCARD improvement on RAID0, which improvements performance on fast NVMe SSDs.
- Device-mapper (dm-raid target) interface

Here are discussion topics in mind, feel free to add more topics into,
- Road map of Partial Parity Log development
- Road map of clustered raid development
- Sysfs interface improvement for mdadm and MD Raid
- Mdadm test suite
- Sync up between raid1 and raid10, there are some features which is only available in raid1 such as new I/O barrier and Write behind (maybe more).
- dm raid interface

Proposed schedule
1) Roadmap sharing for:
– raid5 partial parity log
– md clustering
– sysfs interface improvement
– mdadm test suite
2) Problem disucssion
– Kernel space and user space collaboration for sysfs interface improvement
– raid1 & raid10 sync up
– dm raid interface


md, dm, raid


  • Biography

    Worked for SUSE HA product, focus on clustered raid development.

  • Coly_head

    Coly Li

    SUSE Linux


    Software engineer, works for SUSE Linux, SUSE maintainer for md/dm/bcache and other block layer parts. Active Linux kernel community member.

  • Biography

    Shaohua Li is a Linux kernel developer since 2003. He currently works for Facebook and is the maintainer of Software RAID. His contribution covers different Linux kernel subsystems from x86, IOMMU, storage to memory management. He is interesting in high speed storage support, optimization and adoption.

  • Jes Sorensen

    Red Hat


    Jes Sorensen is a veteran Linux kernel developer since 1993. He has worked extensively in and around the Linux kernel; including areas such as Virtualization, high speed networking, and Linux/ia64. He has written more device drivers than he can remember as well as worked on the system libraries (glibc) and Virtualization for high end NUMA systems. Most recently he has been working on KVM/QEMU scalability and live snapshot support. Jes has been a frequent speaker and tutorial instructor at various Open Source related conferences. Today Jes works as a Principal Software Engineer for Red Hat Inc. (, as part of the Red Hat Virtualization team. Prior to Red Hat, Jes has worked for Silicon Graphics, Wild Open Source, Linuxcare and the European Laboratory for Particle Physics (