Linux Plumbers Conference 2020

US/Pacific
Description

August 24-28, virtually

The Linux Plumbers Conference is the premier event for developers working at all levels of the plumbing layer and beyond.  LPC 2020 will be held virtually August 24-28.  We are looking forward to seeing you online!

    • 07:00 11:00
      BOFs Session BOF1/Virtual-Room (LPC 2020)

      BOF1/Virtual-Room

      LPC 2020

      150
    • 07:00 11:00
      Containers and Checkpoint/Restore MC Microconference1/Virtual-Room (LPC 2020)

      Microconference1/Virtual-Room

      LPC 2020

      150

      The Containers and Checkpoint/Restore MC at Linux Plumbers is the opportunity for runtime maintainers, kernel developers and others involved with containers on Linux to talk about what they are up to and agree on the next major changes to kernel and userspace.

      Common discussions topic tend to be improvement to the user namespace, opening up more kernel functionalities to unprivileged users, new ways to dump and restore kernel state, Linux Security Modules and syscall handling.

    • 07:00 11:00
      GNU Tools Track GNU Tools track/Virtual-Room (LPC 2020)

      GNU Tools track/Virtual-Room

      LPC 2020

      150

      The GNU Tools track will gather all GNU tools developers, to discuss current/future work, coordinate efforts, exchange reports on ongoing efforts, discuss development plans for the next 12 months, developer tutorials and any other related discussions.
      The track will also include a Toolchain Microconference to discuss topics that are more specific to the interaction between the Linux kernel and the toolchain.

      • 07:00
        Break 15m
    • 07:00 11:00
      LPC Refereed Track: SPONSORED BY FACEBOOK Refereed Track/Virtual-Room (LPC 2020)

      Refereed Track/Virtual-Room

      LPC 2020

      150
      • 07:00
        A theorem for the RT scheduling latency (and a measuring tool too!) 45m

        Defining Linux as an RTOS might be risky when we are outside of the kernel community. We know how and why it works, but we have to admit that the black-box approach used by cyclictest to measure the PREEMPT_RT’s primary metric, the scheduling latency, might not be enough for trying to convince other communities about the properties of the kernel-rt.

        In the real-time theory, a common approach is the categorization of a system as a set of independent variables and equations that describe its integrated timing behavior. Two years ago, Daniel presented a model that could explain the relationship between the kernel events and the latency, and last year he showed a way to observe such events efficiently. Still, the final touch, the definition of the bound for the scheduling latency of the PREEMPT_RT using an approach accepted by the theoretical community was missing. Yes, it was.

        Closing the trilogy, Daniel will present the theorem that defines the scheduling latency bound, and how it can be efficiently measured, not only as a single value but as the composition of the variables that can influence the latency. He will also present a proof-of-concept tool that measures the latency. In addition to the analysis, the tool can also be used in the definition of the root cause of latency spikes, which is another practical problem faced by PREEMPT_RT developers and users. However, discussions about how to make the tool more developers-friendly are still needed, and that is the goal of this talk.

        The results presented in this talk was published at the ECRTS 2020, a top-tier academic conference about real-time systems, with reference to the discussions made in the previous edition of the Linux Plumbers.

      • 07:45
        Break 15m
      • 08:00
        Morello and the challenges of a capability-based ABI 45m

        The Morello project is an experimental branch of the Arm architecture for evaluating the deployment and impact of capability-based security. This experimental ISA extension builds on concepts from the CHERI project from Cambridge University.

        As experimentations with Morello on Linux are underway, this talk will focus on the pure-capability execution environment, where all pointers are represented as 128-bit capabilities with tight bounds and limited permissions. After a brief introduction to the Morello architecture, we will outline the main challenges to overcome for the kernel to support a pure-capability userspace. Beyond the immediate issue of adding a syscall ABI where all pointers are 128 bits wide, the kernel is expected to honour the restrictions associated with user capability pointers when it dereferences them, in order to prevent the confused deputy problem.

        These challenges can be approached in multiple ways, with different trade-offs between robustness, maintainability and invasiveness. We will attempt at covering a few of these approaches, in the hope of generating useful discussions with the community.

      • 08:45
        Break 15m
      • 09:00
        Core Scheduling: Taming Hyper-Threads to be secure 45m

        The core idea behind core scheduling is to have SMT (Simultaneous Multi Threading) on and make sure that only trusted applications run concurrently on the hardware threads of a core. If there is no group of trusting applications runnable on the core, we need to make sure that remaining hardware threads are idle while applications run in isolation on the core. While doing so, we should also consider the performance aspects of the system. Theoretically it is impossible to reach the same level of performance where all hardware threads are allowed to run any runnable application. But if the performance of core scheduling is worse than or the same as that without SMT, we do not gain anything from this feature other than added complexity in the scheduler. So the idea is to achieve a considerable boost in performance compared to SMT turned off for the majority of production workloads.

        This talk is continuation of the core scheduling talk and micro-conference at LPC 2019. We would like to discuss the progress made in the last year and the newly identified use-cases of this feature.

        Progress has been made in the performance aspects of core scheduling. Couple of patches addressing the load balancing issues with core scheduling, have improved the performance. And stability issues in v5 have been addressed as well.

        One area of criticism was that the patches were not addressing all cases where untrusted tasks can run in parallel. Interrupts are one scenario where the kernel runs on a cpu in parallel with a user task on the sibling. While two user tasks running on the core could be trusted, when an interrupt arrives on one cpu, the situation changes. Kernel starts running in interrupt context and the kernel cannot trust the user task running on the other sibling cpu. A prototype fix has been developed to fix this case. One gap that still exists is the syscall boundary. Addressing the syscall issue would be a big hit to performance, and we would like to discuss possible ways to fix it without hurting performance.

        Lastly, we would also like to discuss the APIs for exposing this feature to userland. As of now, we use CPU controller CGroups. During the last LPC, we had discussed this in the presentation, but we had not decided on any final APIs yet. ChromeOS has a prototype which uses prctl(2) to enable the core scheduling feature. We would like to discuss possible approaches suitable for all use cases to use the core scheduling feature.

      • 09:45
        Break 15m
      • 10:00
        Data-race detection in the Linux kernel 45m

        In this talk, we will discuss data-race detection in the Linux kernel. The talk starts by briefly providing background on data races, how they relate to the Linux-kernel Memory Consistency Model (LKMM), and why concurrency bugs can be so subtle and hard to diagnose (with a few examples). Following that, we will discuss past attempts at data-race detectors for the Linux kernel and why they never reached production quality to make it into the mainline Linux kernel. We argue that a key piece to the puzzle is the design of the data-race detector: it needs to be as non-intrusive as possible, simple, scalable, seamlessly evolve with the kernel, and favor false negatives over false positives. Following that, we will discuss the Kernel Concurrency Sanitizer (KCSAN) and its design and some implementation details. Our story also shows that a good baseline design only gets us so far, and most important was early community feedback and iterating. We also discuss how KCSAN goes even further, and can help detect concurrency bugs that are not data races.

        Tentative Outline:
        - Background
        -- What are data races?
        -- Concurrency bugs are subtle: some examples
        - Data-race detection in the Linux kernel
        -- Past attempts and why they never made it upstream
        -- What is a reasonable design for the kernel?
        -- The Kernel Concurrency Sanitizer (KCSAN)
        --- Design
        --- Implementation
        -- Early community feedback and iterate!
        - Beyond data races
        -- Concurrency bugs that are not data races
        -- How KCSAN can help find more bugs
        - Conclusion

        Keywords: testing, developer tools, concurrency, bug detection, data races
        References: https://lwn.net/Articles/816850/, https://lwn.net/Articles/816854/

    • 07:00 11:00
      Networking and BPF Summit Networking and BPF Summit/Virtual-Room (LPC 2020)

      Networking and BPF Summit/Virtual-Room

      LPC 2020

      150
    • 07:00 11:00
      Real-time MC Microconference3/Virtual-Room (LPC 2020)

      Microconference3/Virtual-Room

      LPC 2020

      150
    • 07:00 11:00
      linux/arch/* MC Microconference2/Virtual-Room (LPC 2020)

      Microconference2/Virtual-Room

      LPC 2020

      150
    • 07:00 11:00
      Android MC Microconference3/Virtual-Room (LPC 2020)

      Microconference3/Virtual-Room

      LPC 2020

      150
    • 07:00 11:00
      BOFs Session BOF1/Virtual-Room (LPC 2020)

      BOF1/Virtual-Room

      LPC 2020

      150
    • 07:00 11:00
      GNU Tools Track GNU Tools track/Virtual-Room (LPC 2020)

      GNU Tools track/Virtual-Room

      LPC 2020

      150

      The GNU Tools track will gather all GNU tools developers, to discuss current/future work, coordinate efforts, exchange reports on ongoing efforts, discuss development plans for the next 12 months, developer tutorials and any other related discussions.
      The track will also include a Toolchain Microconference to discuss topics that are more specific to the interaction between the Linux kernel and the toolchain.

    • 07:00 11:00
      Kernel Dependability & Assurance MC Microconference2/Virtual-Room (LPC 2020)

      Microconference2/Virtual-Room

      LPC 2020

      150
    • 07:00 11:00
      LPC Refereed Track: SPONSORED BY FACEBOOK Refereed Track/Virtual-Room (LPC 2020)

      Refereed Track/Virtual-Room

      LPC 2020

      150
      • 07:00
        Write once, herd everywhere 45m

        With Linux Kernel Memory Model introduced into kernel, litmus tests have been proven to be a powerful tool to analyze and design parallel code. More and more C litmus tests are written, some of which are merged into Linux mainline.

        Actually the herd tool behind LKMM have models for most of mainstream architectures: litmus tests in asm code are supported. So in theory, we can verify a litmus test in different versions (C and asm code), and this will help us on 1) verifying the correct of LKMM and 2) test the implementation of parallel primitives in a particular architecture, by comparing the results of exploring the state spaces of different versions of litmus tests.

        This topic will present some work to make it possible to translate between limuts tests (mostly C to asm code). The work provides an interface for architecture maintainers to provide their rules for the litmus translation, in this way, we can verify the consistency between LKMM and the implementation of parallel primitives, and this could also help new architectures to provide parallel primitives consistent with LKMM.

        This topic will introduce the overview of the translation and hopefully some discussion will be made during or after the topic on the interface.

      • 07:45
        Break 15m
      • 08:00
        Desktop Resource Management (GNOME) 45m

        Graphical user sessions have been plagued with various performance related issues. Sometimes these are simply bugs, but often enough issues arise because workstations are loaded with other tasks. In this case a high memory, IO or CPU use may cause severe latency issues for graphical sessions. In the past, people have tried various ways to improve the situation, from running without swap to heuristically detecting low memory situations and triggering the OOM. These techniques may help in certain conditions but also have their limitations.

        GNOME and other desktops (currently KDE) are moving towards managing all applications using systemd. This change in architecture also means that every application is placed into a separate cgroup. These can be grouped to separate applications from essential services and they can also be adjusted dynamically to ensure that interactive applications have the resources they need. Examples of possible interventions are allocating more CPU weight to the currently focused application, creating memory and IO latency guarantees for essential services (compositor) or running oomd to kill applications when there is memory pressure.

        The talk will look at what GNOME (and KDE) currently does in this regard and how well it is working at at this point so far. This may show areas where further improvements in the stack are desirable.

      • 08:45
        Break 15m
      • 09:00
        Configuring a kernel for safety critical applications 45m

        For security there are various projects which provide guidelines on how to configure a secure kernel - e.g., Linux Self Protection Project. In addition there are security enhancements which have been added to the Linux kernel by various groups - e.g., grsecurity or PAX security patch.
        We are looking to define appropriate guidelines for safety enhancements to the Linux kernel. The session will focus on the following:
        1. Define the use cases (primarily in automotive domain) and the need for safety features.
        2. Define criteria for safe kernel configurations.
        3. Define a preliminary proposal for a serious workgroup to define requirements for relevant safety enhancements.
        Note that the emphasis is 100% technical, and not related in any way to safety assessment processes. I will come with an initial set of proposals, to be discussed and for follow up.

      • 09:45
        Break 15m
      • 10:00
        Kernel Address Space Isolation 45m

        First investigations about Kernel Address Space Isolation (ASI) were presented at LPC last year as a way to mitigate some cpu hyper-threading data leaks possible with speculative execution attacks (like L1 Terminal Fault (L1TF) and Microarchitectural Data Sampling (MDS)). In particular, Kernel Address Space Isolation aims to provide a separate kernel address space for KVM when running virtual machines, in order to protect against a malicious guest VM attacking the host kernel using speculative execution attacks.

        https://www.linuxplumbersconf.org/event/4/contributions/277/

        At that time, a first proposal for implementing KVM Address Space Isolation was available. Since then, new proposals have been submitted. The implementation have become much more robust and it now provides a more generic framework which can be used to implement KVM ASI but also Kernel Page Table Isolation (KPTI).

        Currently, RFC version 4 of Kernel Address Space Isolation is available. The proposal is divided into three parts:

        • Part I: ASI Infrastructure and PTI

          https://lore.kernel.org/lkml/20200504144939.11318-1-alexandre.chartre@oracle.com/
        • Part II: Decorated Page-Table

          https://lore.kernel.org/lkml/20200504145810.11882-1-alexandre.chartre@oracle.com/
        • Part III: ASI Test Driver and CLI

          https://lore.kernel.org/lkml/20200504150235.12171-1-alexandre.chartre@oracle.com/

        This presentation will show progress and evolution of the Kernel Address Space Isolation project, detail the kernel ASI framework and how it is used to implement KPTI and KVM ASI. It also looks forward to discuss possible way to integrate the project upstream, concerns about making changes in some of the nastiest corners of the x86, and kernel page table management improvement, in particular page table creation and population.

    • 07:00 11:00
      Networking and BPF Summit Networking and BPF Summit/Virtual-Room (LPC 2020)

      Networking and BPF Summit/Virtual-Room

      LPC 2020

      150
    • 07:00 11:00
      Scheduler MC Microconference1/Virtual-Room (LPC 2020)

      Microconference1/Virtual-Room

      LPC 2020

      150
    • 07:00 11:00
      BOFs Session BOF1/Virtual-Room (LPC 2020)

      BOF1/Virtual-Room

      LPC 2020

      150
    • 07:00 11:00
      GNU Tools Track GNU Tools track/Virtual-Room (LPC 2020)

      GNU Tools track/Virtual-Room

      LPC 2020

      150

      The GNU Tools track will gather all GNU tools developers, to discuss current/future work, coordinate efforts, exchange reports on ongoing efforts, discuss development plans for the next 12 months, developer tutorials and any other related discussions.
      The track will also include a Toolchain Microconference to discuss topics that are more specific to the interaction between the Linux kernel and the toolchain.

    • 07:00 08:00
      LPC Refereed Track: SPONSORED BY FACEBOOK Refereed Track/Virtual-Room (LPC 2020)

      Refereed Track/Virtual-Room

      LPC 2020

      150
      • 07:00
        Recent changes in the kernel memory accounting (or how to reduce the kernel memory footprint by ~40%) 45m

        Not a long time ago memcg accounting used the same approach for all types of pages.Each charged page had a pointer at the memory cgroup in the struct page. And it held a single reference to the memory cgroup, so that the memory cgroup structure was pinned in the memory by all charged pages.

        This approach was simple and nice, but it didn't work well for some kernel objects,which are often shared between memory cgroups. E.g. an inode or a denty can outlive the original memory cgroup by far, because it can be actively used by someone else.Because there was no mechanism for the ownership change, the original memory cgroup was pinned in the memory, so that only a very heavy memory pressure could get rid of it.This lead to a so called dying memory cgroups problem: an accumulation of dying memory cgroups with uptime.

        It has been solved by switching to an indirect scheme, where slab pages didn't reference the memory cgroup directly, but used a memcg pointer in the corresponding slab cache instead.The trick was that the pointer can be atomically swapped to the parent memory cgroup. In combination with slab caches reference counters it allowed to solve the dying memcg problem,but made the corresponding code even more complex: dynamic creation and destruction of per-memcg slab caches required a tricky coordination between multiple objects with different life cycles.

        And the resulting approach still had a serious flow: each memory cgroup had it's own set of slab caches and corresponding slab pages. On a modern system with many memory cgroups it resulted in a poor slab utilization, which varied around 50% in my case. This made the accounting quite expensive: it almost doubled the kernel memory footprint.

        To solve this problem the accounting has to be moved from a page level to an object level.If individual slab objects can be effectively accounted on individual level, there is no more need to create per-memcg slab caches. A single set of slab caches and slab pages can be used by all memory cgroups, which brings the slab utilization back to >90% and saves ~40% of total kernel memory.To keep the reparenting working and not reintroduce the dying memcg problem, an intermediate accounting vessel called obj_cgroup is introduced. Of course, some memory has to be used to store an objcg pointer for each slab object, but it's by far smaller than consequences of a poor slab utilization. The proposed new slab controller [1] implements a per-object accounting approach.It has been used on the Facebook production hosts for several months and brought significant memory savings (in a range of 1 GB per host and more) without any known regressions.

        The object-level approach can be used to add an effective accounting of objects, which are by their nature not page-based: e.g. percpu memory. Each percpu allocation is scattered over multiple pages, but if it's small, it takes only a small portion of each page. Accounting such objects was nearly impossible on a per-page basis (duplicating chunk infrastructure will result in a terrible overhead),but with a per-object approach it's quite simple. Patchset [2] implements it. Perpcu memory is getting more and more used as a way to solve the contention problem on a multi-CPU system. Cgroups internals and bpf maps seem to be biggest users at this time, but likely new use cases will be added. It can easily take hundreds on MBs on a host, so if it's not account edit creates an issue in container memory isolation.

        Links:
        [1] https://lore.kernel.org/linux-mm/20200527223404.1008856-1-guro@fb.com/
        [2] https://lore.kernel.org/linux-mm/20200528232508.1132382-1-guro@fb.com/

      • 07:45
        Break 15m
    • 07:00 11:00
      Networking and BPF Summit Networking and BPF Summit/Virtual-Room (LPC 2020)

      Networking and BPF Summit/Virtual-Room

      LPC 2020

      150
    • 07:00 11:00
      RISC-V MC Microconference3/Virtual-Room (LPC 2020)

      Microconference3/Virtual-Room

      LPC 2020

      150
    • 07:00 11:00
      System Boot and Security MC Microconference2/Virtual-Room (LPC 2020)

      Microconference2/Virtual-Room

      LPC 2020

      150
    • 07:00 11:00
      Testing and Fuzzing MC Microconference1/Virtual-Room (LPC 2020)

      Microconference1/Virtual-Room

      LPC 2020

      150
    • 08:00 11:00
      Kernel Summit Refereed Track/Virtual-Room (LPC 2020)

      Refereed Track/Virtual-Room

      LPC 2020

      150
      • 08:00
        SoC support lifecycle in the kernel 45m

        The world of system-on-chip computing has changed drastically over the past years with the current state being much more diverse as the industry keeps moving to 64-bit processors, to little-endian addressing, to larger memory capacities, and to a small number of instruction set architectures.

        In this presentation, I discuss how and why these changes happen, and how we can find a balance between keeping older technologies working for those that rely on them, and identifying code that has reached the end of its useful life and should better get removed.

      • 08:45
        Break 15m
      • 09:00
        seccomp feature development 45m

        As outlined in https://lore.kernel.org/lkml/202005181120.971232B7B@keescook/ the topics include:

        • fd passing
        • deep argument inspection
        • changing structure sizes
        • syscall bitmasks

        Specifically, seccomp needs to grow the ability to inspect Extensible Argument syscalls, which requires that it inspect userspace memory without Time-of-Check/Time-of-Use races and without double-copying. Additionally, since the structures can grow and be nested, there needs to be a way to deal with flattening the arguments into a linear buffer that can be examined by seccomp's BPF dialect. All of this also needs to be handled by the USER_NOTIF implementation. Finally, fd passing needs to be finished, and there needs to be an exploration of syscall bitmasks to augment the existing filters to gain back some performance.

      • 09:45
        Break 15m
      • 10:00
        DAMON: Data Access Monitoring Framework for Fun and Memory Management Optimizations 45m

        Background

        In an ideal world, memory management provides the optimal placement of data objects under accurate predictions of future data access. Current practical implementations, however, rely on coarse information and heuristics to keep the instrumentation overhead minimal. A number of memory management optimization works were therefore proposed, based on the finer-grained access information. Lots of those, however, incur high data access pattern instrumentation overhead, especially when the target workload is huge. A few of the others were able to keep the overhead small by inventing efficient instrumentation mechanisms for their use case, but such mechanisms are usually applicable to their use cases only.

        We can list up below four requirements for the data access information instrumentation that must be fulfilled to allow adoption into a wide range of production environments:

        • Accuracy. The instrumented information should be useful for DRAM level memory management. Cache-level accuracy would not highly required, though.
        • Light-weight overhead. The instrumentation overhead should be low enough to be applied online while making no impact on the performance of the main workload.
        • Scalability. The upper-bound of the instrumentation overhead should be controllable regardless of the size of target workloads, to be adopted in general environments that could have huge workloads.
        • Generality. The mechanism should be widely applicable.

        DAMON: Data Access MONitor

        DAMON is a data access monitoring framework subsystem for the Linux kernel that designed to mitigate this problem. The core mechanisms of DAMON called 'region based sampling' and 'adaptive regions adjustment' make it fulfill the requirements. Moreover, its general design and flexible interface allow not only the kernel code but also the user space can use it.

        Using this framework, therefore, the kernel's core memory management mechanisms including reclamation and THP can be optimized for better memory management. The memory management optimization works that incurring high instrumentation overhead will be able to have another try. In user space, meanwhile, users who have some special workloads will be able to write personalized tools or applications for deeper understanding and specialized optimizations of their systems.

        In addition to the basic monitoring, DAMON also provides a feature dedicated to semi-automated memory management optimizations, called DAMON-based Operation Schemes (DAMOS). Using this feature, the DAMON users can implement complex data access aware optimizations in only a few lines of human-readable schemes descriptions.

        Overhead and Performance

        We evaluated DAMON's overhead, monitoring quality, and usefulness using 25 realistic workloads on my QEMU/KVM based virtual machine.

        DAMON is lightweight. It increases system memory usage by only -0.39% and consumes less than 1% CPU time in the typical case. It slows target workloads down by only 0.63%.

        DAMON is accurate and useful for memory management optimizations. An experimental DAMON-based operation scheme for THP removes 69.43% of THP memory overhead while preserving 37.11% of THP speedup. Another experimental DAMON-based reclamation scheme reduces 89.30% of residential sets and 22.40% of system memory footprint while incurring only 1.98% runtime overhead in the best case.

        Current Status of The Project

        Development of DAMON started in 2019, and several iterations were presented in academic papers[1,2,3], the kernel summit of last year[4], and an LWN article[4]. The source code is available[6] for use and modification, the patchsets[7] are periodically being posted for review.

        Agenda

        I will briefly introduce DAMON and share how it has evolved since last year's kernel summit talk. I will introduce some new features, including the DAMON-based operation schemes. There will be a live demonstration and I will show performance evaluation results. I will outline plans and the roadmap of this project, leading to a Q&A session to collect feedback with a view on getting it ready for general use and upstream inclusion.

        [1] SeongJae Park, Yunjae Lee, Yunhee Kim, Heon Y. Yeom, Profiling Dynamic Data Access Patterns with Bounded Overhead and Accuracy. In IEEE International Workshop on Foundations and Applications of Self- Systems (FAS 2019), June 2019. https://ieeexplore.ieee.org/abstract/document/8791992
        [2] SeongJae Park, Yunjae Lee, Heon Y. Yeom, Profiling Dynamic Data Access Patterns with Controlled Overhead and Quality. In 20th ACM/IFIP International Middleware Conference Industry, December 2019. https://dl.acm.org/citation.cfm?id=3368125
        [3] Yunjae Lee, Yunhee Kim, and Heon. Y. Yeom, Lightweight Memory Tracing for Hot Data Identification, In Cluster computing, 2020. (Accepted but not published yet)
        [4] SeongJae Park, Tracing Data Access Pattern with Bounded Overhead and Best-effort Accuracy. In The Linux Kernel Summit, September 2019. https://linuxplumbersconf.org/event/4/contributions/548/
        [5] Jonathan Corbet, Memory-management optimization with DAMON. In Linux Weekly News, February 2020. https://lwn.net/Articles/812707/
        [6] https://github.com/sjp38/linux/tree/damon/master
        [7] https://lore.kernel.org/linux-mm/20200525091512.30391-1-sjpark@amazon.com/

    • 07:00 11:00
      BOFs Session BOF1/Virtual-Room (LPC 2020)

      BOF1/Virtual-Room

      LPC 2020

      150
    • 07:00 11:00
      GNU Tools Track GNU Tools track/Virtual-Room (LPC 2020)

      GNU Tools track/Virtual-Room

      LPC 2020

      150

      The GNU Tools track will gather all GNU tools developers, to discuss current/future work, coordinate efforts, exchange reports on ongoing efforts, discuss development plans for the next 12 months, developer tutorials and any other related discussions.
      The track will also include a Toolchain Microconference to discuss topics that are more specific to the interaction between the Linux kernel and the toolchain.

    • 07:00 10:00
      Kernel Summit Refereed Track/Virtual-Room (LPC 2020)

      Refereed Track/Virtual-Room

      LPC 2020

      150
      • 07:00
        Extensible Syscalls 45m

        Most Linux syscall design conventions have been established through trial and
        error. One well-known example is the missing flag argument in a range of
        syscalls that triggered the addition of a revised version of theses syscalls.
        Nowadays, adding a flag argument to keep syscalls extensible is an accepted
        convention recorded in our kernel docs.

        In this session we'd like to propose and discuss a few simple conventions that
        have proven useful over time and a few new ones that were just established
        recently with the addition of new in-kernel apis. Ideally these conventions
        would be added to the kernel docs and maintainers encouraged to use them as
        guidance when new syscalls are added.
        We believe that these conventions can lead to a more consistent (and possibly
        more pleasant) uapi going forward making programming on Linux easier for
        userspace. They hopefully also prevent new syscalls running into various
        design pitfalls that have lead to quirky or cumbersome apis and (security) bugs.

        Topics we'd like to discuss include the use of structs versioned by size in
        syscalls such as openat2(), sched_{set,get}_attr(), and clone3() and the
        associated api that we added last year, whether new syscalls should be allowed
        to use nested pointers in general and specifically with an eye on being
        conveniently filterable by seccomp, the convention to always use unsigned int
        as the type for register-based flag arguments intstead of the current potpourri
        of types, naming conventions when revised versions of syscalls are added, and -
        ideally a uniform way - how to test whether a syscall supports a given feature.

      • 07:45
        Break 15m
      • 08:00
        Kernel documentation 45m

        The long process of converting the kernel's documentation into RST is
        finally coming to an end...what has that bought us? We have gone from a
        chaotic pile of incomplete, crufty, and un-integrated docs to a slightly
        better organized pile of incomplete, crufty, slightly better integrated
        docs. Plus we have the infrastructure to make something better from here.

        What are the next steps for kernel documentation? What would we really
        like our docs to look like, and how might we find the resources to get
        them to that point? What sorts of improvements to the build
        infrastructure would be useful? I'll come with some ideas (some of which
        you've certainly heard before) but will be more interested in listening.

      • 08:45
        Break 15m
      • 09:00
        Restricted kernel address spaces 45m

        This proposal is recycled from the one I've suggested to LSF/MM/BPF [0].
        Unfortunately, LSF/MM/BPF was cancelled, but I think it is still
        relevant.

        Restricted mappings in the kernel mode may improve mitigation of hardware
        speculation vulnerabilities and minimize the damage exploitable kernel bugs
        can cause.

        There are several ongoing efforts to use restricted address spaces in
        Linux kernel for various use cases:
        speculation vulnerabilities mitigation in KVM [1]
        support for memory areas with more restrictive protection that the
        defaults ("secret", or "protected" memory) [2], [3], [4]
        * hardening of the Linux containers [ no reference yet :) ]

        Last year we had vague ideas and possible directions, this year we have
        several real challenges and design decisions we'd like to discuss:

        • "Secret" memory userspace APIs

        Should such API follow "native" MM interfaces like mmap(), mprotect(),
        madvise() or it would be better to use a file descriptor , e.g. like
        memfd-create does?

        MM "native" APIs would require VM_something flag and probably a page flag
        or page_ext. With file-descriptor VM_SPECIAL and custom implementation of
        .mmap() and .fault() would suffice. On the other hand, mmap() and
        mprotect() seem better fit semantically and they could be more easily
        adopted by the userspace.

        • Direct/linear map fragmentation

        Whenever we want to drop some mappings from the direct map or even change
        the protection bits for some memory area, the gigantic and huge pages
        that comprise the direct map need to be broken and there's no THP for the
        kernel page tables to collapse them back. Moreover, the existing API
        defined in <asm/set_memory.h> by several architectures do not really
        presume it would be widely used.

        For the "secret" memory use-case the fragmentation can be minimized by
        caching large pages, use them to satisfy smaller "secret" allocations and
        than collapse them back once the "secret" memory is freed. Another
        possibility is to pre-allocate physical memory at boot time.

        Yet another idea is to make page allocator aware of the direct map layout.

        • Kernel page table management

        Currently we presume that only one kernel page table exists (well,
        mostly) and the page table abstraction is required only for the user page
        tables. As such, we presume that 'page table == struct mm_struct' and the
        mm_struct is used all over by the operations that manage the page tables.

        The management of the restricted address space in the kernel requires
        ability to create, update and remove kernel contexts the same way we do
        for the userspace.

        One way is to overload the mm_struct, like EFI and text poking did. But
        it is quite an overkill, because most of the mm_struct contains
        information required to manage user mappings.

        My suggestion is to introduce a first class abstraction for the page
        table and then it could be used in the same way for user and kernel
        context management. For now I have a very basic POC that slitted several
        fields from the mm_struct into a new 'struct pg_table' [5]. This new
        abstraction can be used e.g. by PTI implementation of the page table
        cloning and the KVM ASI work.

        [0] https://lore.kernel.org/linux-mm/20200206165900.GD17499@linux.ibm.com/
        [1] https://lore.kernel.org/lkml/20200504145810.11882-1-alexandre.chartre@oracle.com
        [2] https://lore.kernel.org/lkml/20190612170834.14855-1-mhillenb@amazon.de/
        [3] https://lore.kernel.org/lkml/20200130162340.GA14232@rapoport-lnx/
        [4] https://lore.kernel.org/lkml/20200522125214.31348-1-kirill.shutemov@linux.intel.com
        [5] https://git.kernel.org/pub/scm/linux/kernel/git/rppt/linux.git/log/?h=pg_table/v0.0

    • 07:00 11:00
      LLVM MC Microconference1/Virtual-Room (LPC 2020)

      Microconference1/Virtual-Room

      LPC 2020

      150
    • 07:00 11:00
      Networking and BPF Summit Networking and BPF Summit/Virtual-Room (LPC 2020)

      Networking and BPF Summit/Virtual-Room

      LPC 2020

      150
    • 07:00 11:00
      VFIO/IOMMU/PCI MC Microconference2/Virtual-Room (LPC 2020)

      Microconference2/Virtual-Room

      LPC 2020

      150
    • 07:00 11:00
      You, Me, and IoT Two MC Microconference3/Virtual-Room (LPC 2020)

      Microconference3/Virtual-Room

      LPC 2020

      150
    • 07:00 11:00
      Application Ecosystem MC Microconference3/Virtual-Room (LPC 2020)

      Microconference3/Virtual-Room

      LPC 2020

      150
    • 07:00 11:00
      BOFs Session BOF1/Virtual-Room (LPC 2020)

      BOF1/Virtual-Room

      LPC 2020

      150
    • 07:00 11:00
      GNU Tools Track GNU Tools track/Virtual-Room (LPC 2020)

      GNU Tools track/Virtual-Room

      LPC 2020

      150

      The GNU Tools track will gather all GNU tools developers, to discuss current/future work, coordinate efforts, exchange reports on ongoing efforts, discuss development plans for the next 12 months, developer tutorials and any other related discussions.
      The track will also include a Toolchain Microconference to discuss topics that are more specific to the interaction between the Linux kernel and the toolchain.

    • 07:00 11:00
      Networking and BPF Summit Networking and BPF Summit/Virtual-Room (LPC 2020)

      Networking and BPF Summit/Virtual-Room

      LPC 2020

      150
    • 07:00 11:00
      Open Printing MC Microconference1/Virtual-Room (LPC 2020)

      Microconference1/Virtual-Room

      LPC 2020

      150
    • 07:00 11:00
      Power Management and Thermal Control MC Microconference2/Virtual-Room (LPC 2020)

      Microconference2/Virtual-Room

      LPC 2020

      150