Write a Blog >>
PPoPP 2017
Sat 4 - Wed 8 February 2017 Austin, Texas, United States

LLVM has become an integral part of the software-development ecosystem for developing advanced compilers, high-performance computing and software tools. And OpenMP*, a widely accepted industry standard for parallelism at different levels, has been used in applications such as machine learning, image processing, HPC applications for leveraging the potential of modern multi-core processors and accelerator architectures. This tutorial brings the latest update on OpenMP* 5.0 specification draft version and Intel’s contribution to LLVM compiler technology for OpenMP parallelization, vectorization, offloading, and performance tuning for Intel® Xeon processors and Xeon Phi™ co-processors. The tutorial is a half-day program and organized in four parts:

  • Part I: Covers an overview of Latest SkyLake architectures and OpenMP* 5.0 Specification Draft Version (a.k.a Technical Report V4 for OpenMP* 5.0) and advanced topics on high-band memory model with examples.

  • Part II: Covers Intel’s contribution to LLVM compiler technology for OpenMP parallel, SIMD, offloading model such as LLVM IR extensions for OpenMP 4.5 / 5.0; presents the LLVM compiler architecture for implementing OpenMP SIMD, Parallelization, and Offloading. Some techniques and implementation details will be provided and outlined as below:
    a) Demand-driven construction of hierarchical Work-Region graph representation (W-region graph);
    b) Prepare-phase for maintaining privatization and memory fencing semantics in LLVM SSA form for minimizing the impact on existing LLVM passes;
    c) LoopInfo framework improvement to support OpenMP parallelization, vectorization and offloading; and
    d) Threaded and offloading code generation (Lowering, privatization, outlining, data mapping).

  • Part III: covers explicit SIMD vectorization for Intel® Xeon and Xeon Phi™ processors. We will present examples that illustrate how the power of the Intel® compilers and LLVM compilers can be harnessed with minimal user efforts to enable vector-level parallelism from high-level language constructs.
    a) Vectorization for SIMD loop and functions;
    b) Vector code generation with masking to handle complex control flow;
    c) Better interaction with vectorization and loop optimizations (such as loop collapsing and loop fusion)

  • Part IV : covers the design and implementation of vectorization plan framework for enhancing LLVM loop vectorizer and support explicit SIMD vectorization.
    a) See https://reviews.llvm.org/D28975 for details
  • Tutorial Slides (2017-PPoPP-Intel-LLVM-Tutorial.pdf)2.19MiB