Ushering OpenMP* Parallelization and Vectorization Forward in LLVM Compilers (PPoPP 2017 - Tutorials)

Sat 4 - Wed 8 February 2017 Austin, Texas, United States

Who

Xinmin Tian, Hideki Saito, Ernesto Su, Ayal Zaks

Track

PPoPP 2017 Tutorials

Abstract

LLVM has become an integral part of the software-development ecosystem for developing advanced compilers, high-performance computing and software tools. And OpenMP*, a widely accepted industry standard for parallelism at different levels, has been used in applications such as machine learning, image processing, HPC applications for leveraging the potential of modern multi-core processors and accelerator architectures. This tutorial brings the latest update on OpenMP* 5.0 specification draft version and Intel’s contribution to LLVM compiler technology for OpenMP parallelization, vectorization, offloading, and performance tuning for Intel® Xeon processors and Xeon Phi™ co-processors. The tutorial is a half-day program and organized in four parts:

Part I: Covers an overview of Latest SkyLake architectures and OpenMP* 5.0 Specification Draft Version (a.k.a Technical Report V4 for OpenMP* 5.0) and advanced topics on high-band memory model with examples.

Part II: Covers Intel’s contribution to LLVM compiler technology for OpenMP parallel, SIMD, offloading model such as LLVM IR extensions for OpenMP 4.5 / 5.0; presents the LLVM compiler architecture for implementing OpenMP SIMD, Parallelization, and Offloading. Some techniques and implementation details will be provided and outlined as below:
a) Demand-driven construction of hierarchical Work-Region graph representation (W-region graph);
b) Prepare-phase for maintaining privatization and memory fencing semantics in LLVM SSA form for minimizing the impact on existing LLVM passes;
c) LoopInfo framework improvement to support OpenMP parallelization, vectorization and offloading; and
d) Threaded and offloading code generation (Lowering, privatization, outlining, data mapping).

Part III: covers explicit SIMD vectorization for Intel® Xeon and Xeon Phi™ processors. We will present examples that illustrate how the power of the Intel® compilers and LLVM compilers can be harnessed with minimal user efforts to enable vector-level parallelism from high-level language constructs.
a) Vectorization for SIMD loop and functions;
b) Vector code generation with masking to handle complex control flow;
c) Better interaction with vectorization and loop optimizations (such as loop collapsing and loop fusion)

Part IV : covers the design and implementation of vectorization plan framework for enhancing LLVM loop vectorizer and support explicit SIMD vectorization.
a) See https://reviews.llvm.org/D28975 for details

File attachments

Tutorial Slides (2017-PPoPP-Intel-LLVM-Tutorial.pdf)	2.19MiB

Ushering OpenMP* Parallelization and Vectorization Forward in LLVM Compilers

Xinmin TianPresenter

Intel

Hideki SaitoPresenter

Ernesto SuPresenter

Ayal Zaks

Intel and Technion, Israel

Tracks

Workshops