Speakers
Dr
Milfeld Kent
(TACC)Mr
Nicholas Thorne
(Texas Advanced Computing Center)
Description
HPC services continue to provide ever increasing resources and many academic disciplines that formerly did not rely on computational resources are becoming interested in utilizing computational resources to analyze their problems. This gives us a strong incentive to provide more assistance to users over a wider range of disciplines which in turn taxes resources at HPC facilities. This leads us to investigate and/or develop the necessary tools to support easy-use and performant environments when running computations on HPC systems. TACC has developed a large number of tools that we can categorize as “user-facing software” that our users run (sometimes without their knowledge) to ensure that their jobs perform well, and give assistance to debugging efforts when jobs fail. We have selected four important tools, and plan to conduct a hands-on set up for administrators and/or provide details on configuration and management of them. They are primarily focused on environment control and performance analysis: Lmod, containerization (Singularity), Tau and REMORA.
One of TACC’s most successful open-source software packages is Lmod – the Lua based version of the “Environment Modules”. It is the first thing all TACC users encounter and learn to use when they login to a TACC system. We discuss and deploy this tool, and demonstrate what system administrators and package maintainers need to know to build/install/maintain an environment module tool. We also explain how users benefit from Lmod, and the simple and advanced use cases.
We often see communities in specific domain sciences build out similar environments that they want imported into HPC systems. They invest many hours in building these environments (same as we do at TACC for the environment that we present to users) and these environments usually end up in containers. Containerization substantially impacts system administrators because they are environments within the main environment, and many of them have substantial security implications. We demonstrate how to make containers available and investigate the typical usage.
High Performance in HPC systems is something administrators, developers and users should view as paramount in the design of applications, and setting up the optimal environment for production runs. The TAU profiler and analysis utility provides data for evaluating performance, similar to the GNU profiler, but has an advanced GUI interface to make analysis easier (and for viewing events). Another package, REMORA, is a resource monitoring tool, also developed at TACC, which allows users to easily collect execution data (memory, numa, i/o, networking and cpu load). The management, best usage practices and synergy between the TAU and REMORA tools will be presented.
HPC content
"Please see main abstract for HPC info."
Primary authors
Dr
Milfeld Kent
(TACC)
Mr
Nicholas Thorne
(Texas Advanced Computing Center)