Write a Blog >>
PPoPP 2017
Sat 4 - Wed 8 February 2017 Austin, Texas, United States

Presenters

Michael Voss, michaelj.voss@intel.com

Vasanth Tovinkere, vasanth.tovinkere@intel.com

Pablo Reble, pablo.reble@intel.com

Abstract

Intel® Threading Building Blocks (Intel® TBB) is a widely used, portable C++ template library for parallel programming. It is available as both a commercial product and as an open-source project at http://www.threadingbuildingblocks.org. The library provides generic parallel algorithms, concurrent containers, a work-stealing task scheduler, a data flow programming abstraction, low-level primitives for synchronization and thread local storage and a scalable memory algorithm. The generic algorithms in TBB capture many of the common design patterns used in parallel programming. While Intel TBB was first introduced in 2006 as a shared-memory parallel programming library, it has recently been extended to support heterogeneous programming. These new extensions allow developers to more easily coordinate the use of accelerators such as integrated and discrete GPUs, attached devices such as Intel® Xeon Phi co-processors, and FPGAs in to their parallel C++ applications.

This tutorial will briefly cover the basics of the TBB library before presenting deeper coverage of the new features included for heterogeneous programming. Attendees can take part in hands-on exercises to create a small example and evolve it from a host-only shared-memory implementation to a heterogeneous implementation that runs on both the host and an accelerator.

Goals

By the end of the tutorial, attendees will be familiar with the heterogeneous programming features in Intel TBB and will have learned how to build and execute a hybrid application.

Prerequisite Knowledge

Attendees should be comfortable programming in C++ using modern features such as templates and lambda expressions. Attendees should also have an understanding of basic parallel programming concepts such as threads and locks. No previous experience with Intel® Threading Building Blocks is required.

Outline

Part 1: Intel TBB and its heterogeneous features (1.5 hrs)

The Intel TBB Library and its Heterogeneous Features (8:30 am – 9:15 am)

  • Overview of the philosophy and shared-memory features of the library
  • Deep-dive in to the flow graph and its heterogeneous features

Hands-On Exercises (9:15 am – 10:00 am)

  • “Hello TBB”; verifying that the environment is set up correctly – 15 minutes
  • Implement small example as a shared-memory flow graph – 30 minutes

Part 2: Using heterogeneous flow graph nodes to coordinate accelerators (1.25 hrs)

A deep dive in to the nodes to be used in example (10:30 – 10:45)

  • Using async_node to do asynchronous communication
  • Using streaming_node and the OpenCL factory to access integrated graphics and FPGAs ** NOTE: OpenCL is only one of the models supported by streaming_node

Hands-On Exercises (10:45 am – 11:45 am)

  • Adding async_node to the example to overlap an asynchronous computation
  • Adding streaming_node to use an OpenCL-compatible device
  • Targeting an FPGA

About the presenters

Michael Voss is a software engineer in the Developer Products Division (DPD) of the Software and Services Group (SSG) at Intel, and has been a member of the Intel® Threading Building Blocks (Intel® TBB) team since 2006. He is the architect of the Intel TBB flow graph API and is one of the developers of Flow Graph Analyzer, a tool for creating and analyzing the performance of parallel, graph-based applications. He has written over 40 published papers and articles on parallel programming topics. Prior to joining Intel, he was an assistant professor in the Edward S. Rogers Sr. Department of Electrical and Computer Engineering at the University of Toronto. He received his PhD in Electrical Engineering from Purdue University in 2001. His interests include shared memory parallel programming, heterogeneous parallel programming, development and analysis of graph-based applications, compilers and runtime optimization.

Vasanth Tovinkere is a software engineer in the Developer Products Division (DPD) of the Software and Services Group (SSG) at Intel. His current role involves exploring heterogeneous and distributed compute models and new visualization approaches to performance tuning/debugging. He is the architect of the Graph Analyzer Framework and the Flow Graph Analyzer tool that introduces visual parallel programming and performance analyzis of data and control flow graphs. Vasanth began his career at Intel in 1997 as an engineer where he researched threading behavior and performance and worked with early adopters in Wall Street to enable them for multi-processor architectures. He has also been involved in the development of automatic semantic event detectors for digital sports technologies in Intel Labs. Prior to joining Intel, he was involved in the development of automated fuzzy pattern recognition algorithms for NASA’s Mission to Planet Earth Program.

Pablo Reble is a software engineer in the Developer Products Division (DPD) of the Software and Services Group (SSG) at Intel, working on Intel® Threading Building Blocks (Intel® TBB) and Flow Graph Analyzer. He received his PhD in Computer Engineering from RWTH Aachen and has more than 7 years of experience in teaching parallel programming on undergraduate and graduate level. He has authored over 10 peer reviewed papers related to system software for many-core architectures. His interests include parallel computer architectures, parallel programming as well as runtime development and optimization.

Links to information about Intel® Threading Building Blocks

http://www.threadingbuildingblocks.org

The Special Issue of Parallel Universe Magazine, “Intel® Threading Building Blocks Celebrates 10 Years!” https://goparallel.sourceforge.net/wp-content/uploads/2016/06/ParallelUniverseMagazine_Special_Edition_v2.compressed.pdf

Vasanth Tovinkere and Michael Voss, “Flow Graph Designer: A Tool for Designing and Analyzing Intel® Threading Building Blocks Flow Graphs”, 2014 43nd International Conference on Parallel Processing Workshops (ICCPW), p. 149-158, 2014 http://doi.org/10.1109/ICPPW.2014.31