Ushering OpenMP* Parallelization and Vectorization Forward in LLVM Compilers
LLVM has become an integral part of the software-development ecosystem for developing advanced compilers, high-performance computing and software tools. And OpenMP*, a widely accepted industry standard for parallelism at different levels, has been used in applications such as machine learning, image processing, HPC applications for leveraging the potential of modern multi-core processors and accelerator architectures. This tutorial brings the latest update on OpenMP* 5.0 specification draft version and Intel’s contribution to LLVM compiler technology for OpenMP parallelization, vectorization, offloading, and performance tuning for Intel® Xeon processors and Xeon Phi™ co-processors. The tutorial is a half-day program and organized in four parts:
a) Demand-driven construction of hierarchical Work-Region graph representation (W-region graph);
b) Prepare-phase for maintaining privatization and memory fencing semantics in LLVM SSA form for minimizing the impact on existing LLVM passes;
c) LoopInfo framework improvement to support OpenMP parallelization, vectorization and offloading; and
d) Threaded and offloading code generation (Lowering, privatization, outlining, data mapping).
a) Vectorization for SIMD loop and functions;
b) Vector code generation with masking to handle complex control flow;
c) Better interaction with vectorization and loop optimizations (such as loop collapsing and loop fusion)
a) See https://reviews.llvm.org/D28975 for details
Tutorial Slides (2017-PPoPP-Intel-LLVM-Tutorial.pdf) | 2.19MiB |