The vast majority of production parallel scientific applications today use MPI and run successfully on the largest systems in the world. At the same time, the MPI standard itself is evolving to address the needs and challenges of future extreme-scale platforms as well as applications. This tutorial will cover several advanced features of MPI, including new MPI-3 features, that can help users program modern systems effectively. Using code examples based on scenarios found in real applications, we will cover several topics including efficient ways of doing 2D and 3D stencil computation, derived datatypes, one-sided communication, hybrid (MPI + shared memory) programming, topologies and topology mapping, and neighborhood and nonblocking collectives. Attendees will leave the tutorial with an understanding of how to use these advanced features of MPI and guidelines on how they might perform on different platforms and architectures.
This tutorial is about advanced use of MPI. It will cover several advanced features that are part of MPI-1 and MPI-2 (derived datatypes, one-sided communication, thread support, topologies and topology mapping) as well as new features that were added to MPI as part of MPI-3 (substantial additions to the one-sided communication interface, neighborhood collectives, nonblocking collectives, support for shared- memory programming). Implementations of MPI-3 are already available both from vendors and open-source projects. The MPICH implementation of MPI supports MPI-3, and vendor implementations derived from MPICH either already or will soon support these new features. The Open MPI implementation also supports MPI-3. As a result, users will be able to use in practice what they learn in this tutorial. The tutorial will be example driven, reflecting scenarios found in real applications. We will begin with a 2D stencil computation with a 1D decomposition to illustrate simple Isend/Irecv based communication. We will then use a 2D decomposition to illustrate the need for MPI derived datatypes. We will introduce a simple performance model to demonstrate what performance can be expected and compare it with actual performance measured on real systems. This model will be used to discuss, evaluate, and motivate the rest of the tutorial. We will use the same 2D stencil example to illustrate various ways of doing one-sided communication in MPI and discuss the pros and cons of the different approaches as well as regular point-to-point communica- tion. We will then discuss a 3D stencil without getting into complicated code details. We will use examples of distributed linked lists and distributed locks to illustrate some of the new ad- vanced one-sided communication features, such as the atomic read-modify-write operations. We will discuss the support for threads and hybrid programming in MPI and provide two hybrid ver- sions of the stencil example: MPI+OpenMP and MPI+MPI. The latter uses the new features in MPI-3 for shared-memory programming. We will also discuss performance and correctness guidelines for hybrid pro- gramming. We will introduce process topologies, topology mapping, and the new “neighborhood” collective func- tions added in MPI-3. These collectives are particularly intended to support stencil computations in a scalable manner, both in terms of memory consumption and performance. We will conclude with a discussion of other features in MPI-3 not explicitly covered in this tutorial (interface for tools, Fortran 2008 bindings, etc.) as well as a summary of recent activities of the MPI Forum beyond MPI-3. For each example we will follow the same rough format:
- introduce concepts and calls used in the example
- describe the example, especially algorithms and data distributions
- walk through the code, showing how concepts are applied
- show results of running code (as appropriate)