Speaker
Description
Modern HPC workloads exchange vast amounts of data to drive scientific discoveries While HPC systems employ diverse storage devices and tiers to support efficient data access, current monitoring infrastructures, such as Darshan and Score-P, only provide enough information to show what they see, but lack the visibility needed to fully explain observed I/O performance.
In this talk, I will present our latest survey of state-of-the-art monitoring tools deployed on modern HPC system using lists such as TOP500, Green500, IO500, and the Comprehensive Data Center List (CDCL). Then, we will introduce our latest efforts to tackle the opaque monitoring infrastructure that explores the user and kernel I/O stack to uncover causality relationships for achieving eXplainable I/O (XIO).