Speaker
Description
Scientific workflows are now a common tool used by domain scientists in a number of disciplines. They are appealing because they enable users to think at high level of abstraction, composing complex applications from individual application components. Workflow management systems (WMSs), such as Pegasus (http://pegasus.isi.edu) automate the process of executing these workflows on modern cyberinfrastructure. They take these high-level, resource-independent descriptions and map them onto the available heterogeneous compute and storage resources: campus clusters, high-performance systems, high-throughput resources, clouds, and the edge. WMSs can select the appropriate resources based on their architecture, availability of key software, performance, reliability, availability of cycles, storage space, among others. With the help of compiler-inspired algorithms, they can determine what data to save during execution, and which are no longer needed. Similarly to compiler solutions, they can generate an executable workflow that is tailored to the target execution environment, taking into account reliability, scalability, and performance. This talk will describe the key concepts used in the Pegasus WMS to help automate the execution of workflows in distributed and heterogeneous environments. It will showcase applications that have benefited from Pegasus’ automation as well as touch upon new types of science applications and their needs. The talk will explore potential use of artificial intelligence and machine learning approaches to improve the level of automation in workflow management systems.