BoF to discuss topics related to concurrency and offloading work onto accelerators. On the OpenMP side, in particular the implementation of the missing OpenMP 5.0 (soon: 5.1) features.
Especially for offloading with OpenACC/OpenMP, optimizing the performance and in particular restricting the amount and frequency of data transfers is crucial and involves topics like value propagations, cloning, loop parallelizations, and memory management - including pinning, asynchronous operations and unified memory. And with offloading code and GPU offloading becoming ubiquitous, deployment and keeping pace with supporting consumer and high-end hardware updates is a challenge.
Related topics and trends can also be discussed, be it base language concurrency features, offloading without using OpenMP/OpenACC, other accelerators.
|I agree to abide by the anti-harassment policy||I agree|