IEEE Cluster 2016 Tutorials
State-of-the Art in Slurm, MPI-PGAS and BigData
September 13, 2016 (Tue.)
Time: 16:00-18:00
Room: Grand Hall A
16:00-16:40 Resource and Job Management on HPC Clusters with Slurm: Administration, Usage and Performance Evaluation
Yiannis Georgiou (BULL)
16:40-17:20 PGAS and Hybrid MPI+PGAS Programming Models on Modern HPC Clusters with Accelerators
Dhabaleswar K. (DK) Panda (The Ohio State University)
17:20-18:00 Big Data Meets HPC: Exploiting HPC Technologies for Accelerating Apache Hadoop, Spark and Memcached
Dhabaleswar K. (DK) Panda (The Ohio State University)

The following tutorials are organized in conjunction with Cluster 2016.

PGAS and Hybrid MPI+PGAS Programming Models on Modern HPC Clusters with Accelerators

Multi-core processors, accelerators (GPGPUs), co-processors (Xeon Phis) and high-performance interconnects (InfiniBand, 10-40 GigE/iWARP and RoCE) with RDMA support are shaping the architectures for next generation clusters. Efficient programming models to design applications on these clusters and future exascale systems are still evolving. The new MPI-3 standard brings enhancements to Remote Memory Access Model
(RMA) as well as introduce non-blocking collectives. Partitioned Global Address Space (PGAS) Models provide an attractive alternative to the MPI model. At the same time, Hybrid MPI+PGAS programming models are gaining attention as a possible solution to programming exascale systems. In this tutorial, we provide an overview of the programming models (MPI, PGAS and Hybrid MPI+PGAS) and discuss opportunities and challenges in designing the associated runtimes. We start with an in-depth overview of modern system architectures with multi-core processors, GPU accelerators, Xeon Phi co-processors and high-performance interconnects. We present an overview of the new MPI-3 RMA model, language based (UPC and CAF) and library based (OpenSHMEM) PGAS models. We introduce MPI+PGAS hybrid programming models and the associated unified runtime concept. We examine and contrast different challenges in designing high-performance MPI-3 compliant, OpenSHMEM and hybrid MPI+OpenSHMEM runtimes for both host-based and accelerator (GPU- and MIC-) based systems. We present case-studies using application kernels, to demonstrate how one can exploit hybrid MPI+PGAS programming models to achieve better performance without rewriting the complete code. Using the publicly available MVAPICH2-X, MVAPICH2-GDR and MVAPICH-MIC libraries, we present the challenges and opportunities to design efficient MPI, PGAS and hybrid MPI+PGAS runtimes.
Website: http://web.cse.ohio-state.edu/~panda/cluster16_hybrid_tutorial.html
Dhabaleswar K. (DK) Panda, The Ohio State University, USA
Khaled Hamidouche, The Ohio State University, USA

Big Data Meets HPC: Exploiting HPC Technologies for Accelerating Apache Hadoop, Spark and Memcached

The explosive growth of `Big Data' has caused many industrial firms to adopt HPC technologies to meet the requirements of huge amount of data to be processed and stored. According to the IDC study in 2013, 67% of high-performance computing systems were running High- Performance Data Analysis (HPDA) workloads. Apache Hadoop and Spark are gaining prominence in handling Big Data and analytics. Similarly, Memcached in Web-2.0 environment is becoming important for large-scale query processing. Recent studies have shown that default Hadoop, Spark, and Memcached can not efficiently leverage the features of modern high-performance computing clusters, like Remote Direct Memory Access (RDMA) enabled high-performance interconnects, high-throughput and large-capacity parallel storage systems (e.g. Lustre). These middleware are traditionally written with sockets and do not deliver best performance on modern networks. In this tutorial, we will provide an in-depth overview of the architecture of Hadoop components (HDFS, MapReduce, RPC, HBase, etc.), Spark and Memcached. We will examine the challenges in re-designing networking and I/O components of these middleware with modern interconnects, protocols (such as InfiniBand, iWARP, RoCE, and RSocket) with RDMA and storage architectures. Using the publicly available software packages in the High-Performance Big Data project (HiBD, http://hibd.cse.ohio-state.edu), we will provide case studies of the new designs for several Hadoop/Spark/Memcached components and their associated benefits. Through these case studies, we will also examine the interplay between high-performance interconnects, storage (HDD, NVM, and SSD), and multi-core platforms to achieve the best solutions for these components and Big Data applications on modern HPC clusters.
Website: http://web.cse.ohio-state.edu/~panda/cluster16_bigdata_tut.html
Dhabaleswar K. (DK) Panda, The Ohio State University, USA
Xiaoyi Lu, The Ohio State University, USA

Resource and Job Management on HPC Clusters with Slurm: Administration, Usage and Performance Evaluation

This tutorial is upon Slurm Resource and Job Management System (RJMS). Slurm is an open source RJMS, specifically designed for the scalability requirements of state-of-the-art HPC clusters. The tutorial will give an overview of the concepts and underlying architecture of Slurm and it will focus on both administrator configuration and user executions related aspects. It will be decomposed into three parts: Administration, Usage and Performance Evaluation. On the administration part there will be a detailed description and hands-on for features such as job prioritization, resources selection, GPGPUs and generic resources, advanced reservations, accounting (associations, QOS, etc), scheduling(backfill, preemption), high availability, power management, topology aware placement, licenses management, burst buffers, scalability tuning with a particular focus on the configuration of the newly developed power adaptive scheduling technique.

The usage training part will provide in-depth analysis and hands-on for CPU usage parameters, options for multi-core and multi-threaded architectures, prolog and epilog scripts, job arrays, MPI tight integration, CPU frequency scaling usage, accounting / reporting and profiling of user jobs and a particular focus on the newly developed heterogeneous resources job specification language and multiple program multiple data (MPMD) MPI support.

Finally the performance evaluation part will consist of techniques with hands-on to experiment with Slurm in large scales using simulation and emulation which will be valuable for researchers and developers.

For the hands-on exercises particular VM and container environments will be made available along with a pre-installed testbed cluster to enable the experimentation of the different functionalities, usage and configuration capabilities.
Website: https://rjms-bull.github.io/slurm-tutorial/

Yiannis Georgiou
David Glesser