Recordings from the 2020 Virtual Conference are now available.
The annual CHPC National Conference runs from Monday to Wednesday, 30 November to 2 December 2020, as a virtual conference, entirely online.
The aim of the conference: to bring together our users so that their work can be communicated, to include world renowned experts, and to offer a rich programme for students, in the fields of high performance computing, big data, and high speed networking. The CHPC National Conference is co-organised by the CHPC, DIRISA and SANReN.
The theme of the 2020 conference is machine learning and how it has paved new roads in scientific research using HPC. More than a simulation of AI, machine learning is now applied to modelling areas of science where traditional mathematical models struggle with the complexity and volume of data.
THERE WILL BE NO FEES FOR THE ONLINE CONFERENCE.
For more information please see the main conference site.
Silver
We have seen several transformational waves in computing over the last few decades, and AI is a key one. It has significant impact in many research and industrial areas enabling users to get actionable insight from data.
Insights from numerical simulation and modeling - commonly labeled as “HPC” was also one of the important developments in the last two decades – and is an area which has overlap with AI: Key questions are: Can you use AI methods to augment the simulation and modeling applications? – how can you run AI applications on HPC Systems efficiently?
In this presentation we will address those questions and give an overview of respective technology, including hardware platforms & software stacks with a special focus on how to enable successful development of solutions. This will also be illustrated with examples of use cases.
We have designed and implemented a spatial SEIR model for tracking and predicting the course of the
Covid-19 pandemic. The model uses ward-level data to model the e?ect of local conditions, including
socio-economic conditions and climate, on the spread of Covid-19.
We have used the CHPC Lengau cluster to reprocess the groundbreaking MeerKAT observations of the centre of our galaxy. This presentation will contain an overview of the steps involved in turning the raw measurements from the telescope into radio images of the sky, and why using a large-scale computing facility is essential for modern radio astronomy. We will conclude with the results themselves, some spectacular new views of the diverse range of astrophysical phenomena in this unique part of the sky.
Kaldi] is an open source software project that was initiated by the Center for Language and Speech Processing,Johns Hopkins University. It is one of the leading toolkits used for research in automatic speech recognition (ASR).The toolkit employs current machine learning techniques such a deep neural networks and is capable of state-of-the-art performance. Kaldi can be configured for a single personal computer or a high performance computing (HPC) cluster using the Sun Grid Engine. Although configuring Kaldi for parallelisation on a cluster is well documented, it is assumed that the user has complete control. A researcher may have the option to set up an in-house cluster with the advantage of complete control, but maintaining the cluster can become a task that distracts from the research work. The size of an in-house cluster is also limited by the resources and funds available to the researcher. When the disadvantages outweigh the advantages, migrating to a larger, community-based cluster with on-site support becomes attractive. Because such clusters host users from various institutions and disciplines, usage policies and restrictions apply which were not applicable to the in-house cluster. These restrictions affect how Kaldi can be used.
[Full abstract added as a PDF attachment.]
In this talk I'll provide some stories from leading the ChRIS computing
effort (https://chrisproject.org) over several years as it grew from
simple scripts that ran neuro-MRI analysis programs in a small lab to a
distributed container based platform that is both cloud and HPC ready
and currently actively supported by Red Hat, Inc, and their OpenShift
platform for cloud computing.
While providing conceptual overviews of the system architecture, I will
focus specifically on how choices in programming language, frameworks,
architecture, documentation, github and more all played out -- the
good, the bad, and the ugly.
I'll touch on how these choices impact portability across computing
architectures -- including PowerPC and of course upcoming ARM
architectures in Apple consumer products.
Finally, I'll talk about the complexity in fostering community about
growing open source scientific projects with some insights into some of
the pitfalls and highlights.
Air pollution can have large negative impacts on human health, agriculture, ecosystems, visibility and climate. In South Africa, although ambient (i.e. outdoor) air quality is regulated, many areas are out of compliance with the National Ambient Air Quality Standards. In order to protect human health and mitigate impacts, it is critical to improve air quality. The Constitution provides that everyone has a right to have an environment that is not harmful to their health. The Atmospheric Composition Focus Area in the CSIR Climate and Air Quality Modelling (CAQM) research group aims to provide the evidence base to quantify the impacts of air quality and to improve air quality. The group uses the CHPC to run an air quality model (CAMx) to simulate urban and regional air quality at high resolution. CAMx is developed and maintained by Ramboll-ENVIRON (www.camx.com) and is completely open source. CAMx is an atmospheric chemical transport model that simulates the transport and chemistry of pollutants in the atmosphere. The processes represented within the model are complex, and thus computationally intensive, which makes use of the CHPC facility a necessity. In order to run CAMx, the team uses 48 to 72 cores, depending on domain size. In addition, the team uses the CHPC to run metrological (WRF) or climate (CCAM-CABLE) models that provide the meteorological input into CAMx. These are also computationally expensive. Both WRF and CCAM scale very well; and CCAM has been run previously with over 1000 cores. All code (CAMx, WRF and/or CCAM) are compiled with Intel Fortran and all utilize MPI.
Using the CHPC resources, the team has been able to simulate the impact of policy interventions on air quality in cities in South Africa. Additionally, the team has simulated the health risk from air pollution regionally in South Africa. In the past, this has been done using monitoring station data only, which then limits the analysis to only those living directly around the station. These outputs directly provide the evidence base needed for decision makers to draft and implement policies and interventions to effectively improve air quality as well as understand its impacts, now and into the future. This presentation will highlight some current modelling work of the group focusing on urban air quality and air quality management, as well as simulating the impacts of COVID-19 lockdown regulations on air quality in the Highveld.
The Epsilon Aerospace Computational Mechanics research programme finds relevance in the Aerospace and Defence industry. The areas of development are in Weapon Systems Integration (WSI) and Unmanned Aerial Vehicles (UAV's). The state-of-the-art Computational Fluid Dynamics (CFD) and Finite Elements Methods (FEM) numerical techniques are used routinely as a necessary part of the design process. High Fidelity CFD models are required to determine an extensive Aerodynamic load matrix for specific flight manoeuvres at relevant points in the flight envelope. The acquired Aerodynamic load matrix is input in the FEM structural analysis and UAV performance characterization. The use of High-Speed Computing (HPC) allows high fidelity CFD/FEM to be feasible and practical tools in development. The RANS turbulence modeling approach is implemented in OpenFoam with the use of the HISA (High Speed Aerodynamic) and SIMPLE (Semi-Implicit Method for Pressure Linked Equations) solvers to solve for high-speed and low-speed Aerodynamic flow, respectively. HISA is a robust aerodynamic solver that was developed at the Aeronautic Systems Competency Area of the Council for Scientific and Industrial Research in South Africa in collaboration with Flamengro, a division of Armscor SOC Ltd. The turbulence/transitional physical models are typically solved on a 20 Million element mesh. The typical HPC hardware usage is 10 compute nodes in MPI with an average wall-time of 18 hours. The key research programme outcome is the development of optimal products that satisfy customer specification.
NVIDIA has early identified the promising HPC – AI convergence trend and has been working on enabling it. The growing adoption of NVIDIA Ampere GPU by the Top 500 Supercomputers highlights the need of computing acceleration for this HPC & AI convergence. Many projects today demonstrate the benefit of AI for HPC, in terms of accuracy and time to solution, in many domains such as Computational Mechanics (Computational Fluid Mechanics, Solid Mechanics…), Earth Sciences (Climate, Weather and Ocean Modelling), Life Sciences (Genomics, Proteomics…), Computational Chemistry (Quantum Chemistry, Molecular Dynamics…), Computational Physics. NVIDIA today for instance, uses Physics Informed Neural Networks for the heat sink design in our DGX system.
Nowadays, many organizations are trying to find ways to converge classic HPC and AI. There are generally good reasons to do this because of significant similarities between HPC and AI workloads and workload scaling. However, for AI workloads to perform well on clusters, it is also important to be aware of the differences for AI workloads (especially DeepLearning) compared to classic HPC workloads. One of the most important differences are the requirements for storage systems for AI compared to classic HPC. This talk will provide an overview of the special storage system challenges that come with AI workloads, how to characterize and simulate them and especially how to overcome them to ensure that the GPUs can run efficiently instead of stalling on storage access.
Machine learning methods have seen a rapid expansion in applications to various fields of astrophysics in recent years. In this talk, I will discuss our results on using deep convolutional neural networks to estimate the parameters of strong gravitational lenses from telescope data. Estimating these parameters with traditional maximum-likelihood modeling methods is a time- and resource-consuming procedure, involving several data preparation steps and a difficult optimization process. I will discuss how, using deep convolutional networks, we are able to estimate these parameters and their uncertainties 10 million times faster than with traditional methods, with a similar accuracy. With the advent of large volumes of data from upcoming ground and space surveys and the remarkable speed offered by these networks, deep learning promises to become an indispensable tool for the analysis of large survey data.
Addison Snell of Intersect360 Research will give an overview of HPC market trends and forecasts, including the complete (sometimes hidden) dynamics of cloud computing, the near- and long-term industry effects of COVID-19, and reactions from SC20.
A REP-FAMSEC (reaction energy profile–fragment attributed molecular system energy change) protocol designed to explain each consecutive energy change along the reaction pathway will be briefly described. Electron density-dependant energy components defined in the Interacting Quantum Atoms (IQA) energy partitioning scheme are used to explore interactions between meaningful polyatomic fragments of a molecular system. By quantifying energetic contributions, as defined within the REP-FAMSEC method, one can pin-point fragments (atoms) leading to or opposing a chemical change. Usefulness of the REP-FAMSEC method will be demonstrated, as a case study, on the proline catalysed aldol reaction for which a number of mechanisms is being debated for over four decades. Relative stability of S-proline conformers, their catalytic (in)activity and superior affinity of the higher energy conformer to acetone will be fully explained on an atomic and molecular fragment levels. Importance of the CHPC in running high level, time-, and resource-demanding quantum computing will also be advocated.
This talk focuses on the computational fluid dynamics (CFD) research conducted in the Department of Mechanical Engineering Science at the University of Johannesburg. The primary purpose of the talk is to highlight the need for and the benefit of high performance computing (HPC) to enable and grow critical and relevant research in this broad area. The use of CFD and HPC has grown significantly over the years within the Department largely due to the acquisition and provision of HPC platforms. CFD research conducted in the Department has ranged from applications in process engineering, separation processes, solar air heating and atomic layer deposition amongst others. Many of these problems require "multi-physics" modelling approaches coupled with exceptionally fine meshes (sub $mm$) and small time step sizes (on the scale of $\mu s$). As a consequence significant HPC resources are required to execute meaningful simulations in these research areas. A spectrum of computational codes are used that range in methodologies such as Navier-Stokes based CFD and the Lattice Boltzmann Method (LBM). Furthermore a mix of proprietary (commercial) and open source codes are leveraged. This talk will focus on the use of internal (institutional) and external (CHPC) HPC platforms for the execution and growth of research in these areas. The typical HPC setup and work-flow employed, codes used as well as a detailed analysis of the compute resources required for such problems will be detailed in this talk. Much of this research has industrial or societal relevance. Thus, an overview of the key results and the impact thereof will be presented.
The TANGIBL coding project hosted at Nelson Mandela University Computing Sciences, aims at introducing learners to coding concepts without the use of computers. It uses mobile apps, customized tokens and image recognition to give learners the experience of actual code executing. Since 2017, over 20000 learners across the country have been reached through interactive workshops. With the COVID-19 lockdown regulations these workshops became impossible. Consequently the team introduced the concept of virtual tournaments where learners could play from home. More than 1000 learners participated in 3 very successful tournaments, making a wide ranging impact.
There are arguments amongst academics and practitioners alike for the importance of countries on the African to plug into the information society. On the ground, this has resulted in an emphasis on the importance of partnerships not just within institutions of higher learning but also amongst those in industry. Internationally, we note the growth in interest of research around such partnerships and a documented trail of evidence within the academic and practitioner press. The focal point of such research appears to be centered on documented evidence-based interactions of such partnerships. Such international research is lauded in aiding our understanding and proffering best practices around human and computer system interaction issues. Despite this progress, we note with concern scant focus on African case examples around academia-industry collaborations and issues related to high performance computing access and utilization. In this study, through a systematic literature review, we attempt to present the state of being around such research on the African continent using published peer-reviewed studies published between 2000 and 2020. The research findings are grouped around: a) the research methods utilized; b) underpinning theories and c) key factors employed to study academia-industry information collaboration, utilization, and access issues related to high-performance computing systems. Based on the review, we then suggest future research directions including themes that may enhance academia-industry partnerships around issues related to high-performance computing systems access and utilization on the African continent. This research provides useful insights to academics and practitioners on the African continent against the presented ideals.
Welcome and Opening:
    • Welcome remarks SADC Chair, Mr Alberto Tonela, Mozambique
    • Remarks by Ms Anneline Morgan, SADC Secretariat
Ms Anneline Morgan, SADC Secretariat:
• Record of the 9th SADC Cyber-Infrastructure Technical Experts meeting held in December 2019 Johannesburg, South Africa
• Background, expectations and outcomes of meeting
Progress report on implementation of SADC Cyber-Infrastructure Framework , Dr Happy Sithole, South Africa
Importance of Data in responding to COVID-19,
Mr Davis Adieno, Global Partnership for Sustainable Development Data
In July 2018, MeerKAT started its science mission. One of the 5-year legacy programs on MeerKAT is the ThunderKAT large survey project which aims to understand the astrophysical processes in stellar explosions and other energetic outbursts in the sky. In these objects, dramatic changes can occur on very short time scales. In this talk, I will highlight the data challenges of data intensive time-domain radio astronomy. I will give some examples of our experience over the last two years in the rapid analysis and astrophysical interpretation of large MeerKAT data sets on the Ilifu research cloud. As a global collaborative research project involving around 100 researchers, including many postgraduate students from South Africa and African SKA partner countries, the Ilifu research cloud is an invaluable resource to bring the researchers to the MeerKAT research data. I will highlight some of the lessons learned so far, and will look ahead towards the SKA.
Funders of repositories and those who entrust their valuable data to them need to know whether their funds and their confidence are well placed. Stakeholder concerns are addressed in the international standard measure applied to the repository as being worthy of trust. While trust is a complicated concept, the meaning applicable here is that of a confident expectation that the data will be well managed and continue to be accessible over a “long” period of time. This paper will examine the path to trusted digital repository certification, as experienced by the research data management (RDM) project partners of the ilifu consortium.
Climate models are valuable tools for understanding the intricacy of climate system, simulating past climate, and projecting future changes in climate. The models are increasingly being used to provide information for climate change mitigation and adaptation strategies. Yet, climate modelling is lagging behind in Africa because of the many challenges, including computational constraints. The climate models, which use a set of equations (derived from physical, chemical, biological laws) to replicate the climate systems, are usually computationally intensive. Operating them requires huge computer resources, which are not available in most climate research institutions in the Africa. However, the Climate System Analysis Group (CSAG, University of Cape Town) has been active in climate model development and application over Africa because of its access to the CSIR CHPC. The group also uses the facility for building human capacity in climate modelling. For example, using the CHPC, the group lead the development of an adaptive-grid global climate model (called CAM-EULAG), which has the capacity to increase its horizontal resolution locally over Africa. The model has been successfully applied to simulate West African climate, Southern African climate, and tropical cyclones over the South-West Indian Ocean. Three postgraduate students have been trained and graduated on the model. Apart from CAM-EULAG, the group also runs other models like RegCM, WRF, WRF-Chem, MPAS, SPEEDY, ECOCROP on the CHPC. This presentation will highlight some results from climate modelling activities in the group.
Confining polymers in thin films causes significant deviations of their structural and dynamical properties from their bulk phase behaviour. In our presentation, we show the different effect of the substrate on binary polymer blends with linear and cyclic architecture, as well as miktoarm star polymers in the presence of explicit solvent, by means of extensive molecular dynamics simulations. In the first case, we discuss the role of enthalpic and entropic factors of the interfacial free energy of the system in determining which species in the blend preferentially adsorbs at the substrate [1,2]. In the case of miktoarm polymers, we vary the solvent-block interaction to monitor the effect on the morphology and self-assembly of the polymer film [3].
[1] G. Pellicane, M. M. Tchoukouegno, G. T. Mola, M. Tsige
“Surface enrichment driven by polymer topology”
Physical Review E Rapid Communications, 93, 050501 (2016).
[2] F. M. Gaitho and G. Pellicane
“Adsorption of binary polymer mixtures with different topology on a wall”
Results in Physics, 12, 975 (2019).
[3] Z. Workineh, G. Pellicane, M. Tsige
“Tuning solvent quality induces morphological phase transitions in miktoarm star polymer films"
Macromolecules, 53, 15, 6151 (2020).
Accelerating scientific Computing and deep learning applications with NVIDIA Mellanox In-Network Computing engines.
Budgets and management of allocations in an HPC are valuable tools that manage user behavior and the real costs associated with running an HPC either on premise or in the cloud. This talk will explain how Altair PBS Professional 2020.1 deliver these tools and will briefly demonstrate how simple and robust they are in use.
Presentation on Status of Open and Distance Learning in SADC implications of COVID-19 and future of digitation
UNESCO Ms Carloyn  Medel-Anonuevo 
and 
Prof Jako Olivier UNESCO Chair  on Multimodal Learning and Open Educational Resources
Update on establishment of SADC University of Transformation-
Prof Martin Oosthuizen, Southern African Regional University Associations
Discussions
The reducing cost of genomic data generation has outpaced Moore’s Law resulting in many countries starting national genomic initiatives to better represent their populations and create the foundations for precision medicine programs. Programs such as H3Africa seek to generate and leverage genomic data for specific health related research within diverse African populations. Genomic data, unlike biological samples, is captured and stored digitally, and copies of these data can indefinitely be provided to multiple researchers for use in multiple studies.
In order for genomics data to be integrated and be used meaningfully by the scientific community, standardized attributes that define its collection provenance, rich meta-data and conditions of use need to be explicitly provided. H3ABioNet, a Pan-African Bioinformatics Network to support the H3Africa program, has been working on developing various standards such as Case Reporting Forms for phenotype collection provided as REDCap instruments and mapped to ontologies. Working with international initiatives such as the Global Alliance for Genome Health (GA4GH), H3ABioNet is contributing to the adoption and refinement of standards such as the Data Use Ontology for H3Africa genomics data. To facilitate the ease of finding African specific genomic and genetic variation datasets, an African Microbiome portal and an African Precision Medicine portal are being created by H3ABioNet that curate African specific data from various sources for inclusion in these portals. An African Genome Variation Database using Open CGA standards to house H3Africa specific data is being created for research groups with specific focus areas such as rare diseases. Genomic data processing and curation for African COVD-19 is being undertaken by network members and data submitted to the international GISAID platform.
Various outputs from different H3ABioNet projects are being assessed for Findable, Accessible, Interoperable and Reusable (FAIR) standards and what is required for these outputs to be FAIR such as the adoption of Bioschemas for training materials and versioning of code. Creation of a robust data ecosystem that utilizes established standards requires human capacity to be developed. H3ABioNet has been providing a series of data management planning training workshops to H3Africa personnel and students to enable better planning and preservation of research outputs.
Historically, information literacy formed a critical core function in the process of teaching, learning and research. The concepts of information retrieval, processing and verification became synonymous throughout the research lifecycle. Today however information literacy has expanded to accommodate the vast possibilities of teaching, learning and research afforded by the advances in technology. Every possible discipline and process has been affected by information technology, whether it is the enhancement, replacement or augmentation thereof. Underlying the majority of these technologies is the formation, gathering and processing of data. To understand and operate in this new environment that researchers, students and lecturers find themselves in there needs to be a certain level of digital literacy capacity that will allow participants to build and advance existing literacies.
Within most digital literacy frameworks, the topic of data, data management, data processing and content creation is a common thread which highlights the importance of understanding data and how to function in a digital world. The purpose of this presentation is to highlight the importance of data as an underlying theme across the multiple categories within digital literacy frameworks. The presentation will also look at the proposed Digital Literacy Framework for the University of Pretoria and the approach to assist in enhancing the digital literacy profile of not only the library but also general academic and research community. The proposed framework will look into the following categories:
In line with the framework the presentation will also focus on how digital literacy is assessed and how the outcome assists to further develop training programs to enhance the various elements.
Recent advances in DNA sequencing technology and high throughput computation have revolutionised the field of biology. In modern biology, DNA strands that contain the recipe for making an organism are considered as long strings of coded data that need to be treated in the same way as multi-dimensional data in computer science. The scale of data that biologists need to analyse confronts them with a new set of challenges, for which most of them have not received adequate training. In this presentation, I demonstrate some of the HPC pipelines used by the Centre for Ecological Genomics and Wildlife Conservation to assemble genomic data, reconstruct evolutionary relationships between organisms, and investigate the functional significance of biological data. Having access to HPC resources in South Africa is helping the country’s biologists to overcome these challenges and enter a new era in the biological science. We advocate for a close and multidisciplinary collaboration between biologists, computer scientists and other stakeholders to better prepare academia for a smooth transition into the new era. Such collaboration will have positive impacts on the country's economy, public health and ecological diversity that goes beyond the field of biology.
The separation of a dispersed mixture of immiscible fluids is a common problem that arises in many fields of engineering. Such mixtures can take the form of emulsions of one liquid in another, mists or sprays of liquid droplets in a gas continuum, or foams of gas bubbles separated by thin liquid films. Numerous physical phenomena play a part in the behaviour of groups of droplets or bubbles including gravitational settling, steric interactions, film drainage effects, and coalescence or break-up. These phenomena typically operate over a very widerange of length and time scales, making numerical solution with unified solvers extremely challenging. Existing methods for such problems have focused on either resolving the macro scales, typically using Euler-Euler or CFD-DEM coupling algorithms which require empirical closures, or resolving only the micro scales, using direct numerical simulation in small regions around individual particles.
In this talk an alternative approach for such problems, the dynamic multi-marker method, is discussed. The new algorithm draws from fully-resolved VOF techniques developed for the micro scale and extends them efficiently to the mesoscale, permitting systems with hundreds or thousands of dispersed particles to be modelled. Such models are capable of acting as virtual prototypes and numerical experimentation platforms, and can be used to develop a better understanding of the dynamic structures that form in dispersed phase flow problems. They are also potentially useful in generating and refining closures required for macro scale methods. An implementation of the dynamic multi-marker method as a multiphase solver in the OpenFOAM® computational mechanics framework will be presented, together with some discussion of the interesting challenges and difficulties that were encountered in the process of optimising it for modern highly-parallel HPC architectures.
With the maturity of AI algorithms, increased computing power and explosive growth of data, manual labor has become a GPT technology that will Will Drive Social Development Profoundly. AI will also serve as an important technology, cooperating with HPC, showing its prominence in scientific research in various fields.
In spite of all the recent advances in biochemistry and several associated areas, pharmaceutical drug discovery is still notoriously difficult to do, with lives literally at stake. A typical drug interacts with 300 other enzymes and processes within the human body, and a drug has to pass with minimal disruptions to all of these processes in order to deemed safe. Nearly 90% of all targeted drugs fail in Phase 1 of the clinical trials, but today, with AI, we’re finding ways to improve our chances of success. In some cases, AI can create designer molecules that are already screened for previous toxicity and hold the promise to drastically reduce the time to creation of new drugs. In this talk, I’ll be giving a very brief overview of the various AI techniques that are currently being used today.
Role of Cyber-Infrastructure in responding to COVID-19 (HPC and NReNs)
Mr Lourino Chemane- Mozambique,
Prof Anicia Peters-Namibia, 
Dr Happy Sithole-South Africa, 
and Mr Stein Mkandawire- Zambia
Update and progress on Weather and Climate project:
Dr Mary-Jane, Chief Scientist, South African Weather Service
Recap on Governance Structures and status 
– Ms Anneline Morgan, SADC Secretariat
Establishment of Botswana NReN
Dr Tshiamo Motshegwa
Road Map and Action Plan on Outcomes of the 10th SADC Cyber-Infrastructure Experts Meeting and next meeting 2021
Closing and end of meeting
The first Summer School took place in Pretoria during January 2020. The talk will report on our experience and also share the process going forward.
Postgraduates across Africa do not have access to foundational data science and open research training that will allow them to become part of a growing community of researchers, if not fully skilled, at least understanding the rules of entering the arena of modern, open science. Such training material is available but the curriculum has not yet been decentralised and localised for Africa because it is not clearly understood where to place the hubs of knowledge from where the training could be disseminated. There is general consensus that introductory level data science training is essential for all research disciplines – to the extent that some of us see it as an element of the digital literacies. South Africa did not, at the time of us adopting the curriculum, see any institution taking on a mandate to provide such training. From our RDM implementation experience we also knew that it would be a while before generic training, that would put any post-graduate on a reliable path to understanding the data science ecosystem, would be developed in South Africa. We understood that to gain a holistic view of what is needed as foundation training, requires a number of stakeholders to collaborate. The challenge was that there were too many issues to address simultaneously if we wanted to start from scratch. The curriculum of the CODATA-RDA Data Summer School was identified as suitable for our context. The most important benefit of using the standardised curriculum is that we were able to leapfrog from an existing, tried and proven initiative – which saved us considerable time. In addition, the knowledge that the training is also being rolled out to an international community gave us the assurance that we were on the right track. The alumni network linked to the initiative is another very important benefit. Our candidates were immediately pulled into a professional network that would, under different circumstances take many years to develop. Similarly, the exposure to peers from different disciplines has created shared jargon and experience that we firmly believe will show impact in future.
Author/Presenter: Boniface Akuku, Director, Information and Communication Technology at the Kenya Agricultural Research and Livestock Organization(KALRO)
The pressure to manage and utilize available data to improve agricultural productivity continues to increase, yet several challenges exist. The fast-growing population exacerbates the problem in a period of environmental variability. While remotely sensed Earth Observations (EO) data could resolve these environmental issues, the availability of High-Performance Computing (HPC) to take advantage of freely and openly accessible repositories remains a challenge to organizations. As a result, many organizations have not realized the full potential of EO data. Besides, the freely and openly available EO data remains underutilized, mainly because of their complexity, increasing volume, and the lack of efficient processing capabilities. Data Cubes (DC) technology is a new paradigm aiming to realize EO data's full potential by lowering the barriers caused by these Big Data challenges and providing access to extensive Spatio-temporal data in an analysis-ready form. Using the Kenya Data Cube platform as a case study, this paper presents HPC and Big Data integration as an approach to enabling rapid access and processing of EO data. This approach has shown that generating Analysis-Ready Data (ARD) for developing countries can address agricultural productivity needs, but decision-makers must invest in HPC and related technologies as a priority area. Therefore, researchers and universities can use and explore the data cubes produced to advance new methods and algorithms to extract different information to address the host challenges facing agricultural productivity in developing countries.
They still remain underutilized, mainly because of their complexity, increasing volume, and the lack of efficient processing capabilities.
Lately, with the high demand of production and commercialization the need to combine both computational modelling techniques and experiments to speed up the process has become a reality. Computational models typically use observation and manipulation in the same ways as physical experiments, because their goals are often the same. The advantages of computational modelling comes with low cost for raw materials, safety and saves time with procedures of synthesis and characterization in the laboratory. The recent work looks into how CHPC has been used in preparing the models and tested using experiments. Ti-based alloys are considered to be the most attractive metallic materials for aerospace and automobile applications. TiPt is one of the promising shape memory alloys that can be used at high temperatures due to its transformation temperature of 1000 0C. However, the binary alloy has been found to be mechanically unstable and exhibit very low shape memory effect attributed to low critical stress for slip deformation compared to the stress required for martensitic transformation. We present some of the results obtained using both computational modelling and experimental approaches on the alloy wherein addition of a third element to the system is investigated.
The MeerKAT telescope will conduct many scientific projects during the next five years. These range from a few Large Survey Projects (LSPs) and many, smaller, Open Time Projects (OTPs). The LSPs and OTPs will use the excellent imaging quality of MeerKAT to explore new and exciting parts of parameter space, allowing us to study the evolution of galaxies and the nature of transient objects, for example.
To access the sensitivity and the imaging quality of MeerKAT, astronomers will have to deal with an enormous amount of data -- from terabytes for single projects and petabytes for the LSPs. Radio astronomers have thus turned to HPC to deal with the deluge of data, with the aim of producing high science quality data products, while keeping pace with observations. This will allow researchers to mitigate against the negative effects of a data backlog, but will also ensure a high research throughput -- which is essential for the projects and for MeerKAT.
There has been a proliferation in the development of software tools and eco-systems for all parts of the radio astronomy processing stream -- from calibration and imaging, to data visualisation and analysis -- in anticipation of the challenge and opportunity of MeerKAT data. In addition, the data centre plays a central role in determining the feasibility and efficiency of these various workflows, by ensuring the availability of excellent and robust services.
My talk will focus on the computational challenges faced by MeerKAT users, and explore the consequences/requirements for data centres. I will focus on a simple time-cost formalism to assess the risk to science projects, and will present a variety of basic, yet instructive, scenarios.
Tomorrow’s supercomputers will need to leverage the power of heterogeneous architectures in more graceful ways than what can be done today. Doing so will improve the trajectory of future performance gains.
Sea-level rise constitutes a significant risk for over 600 million people in the Low-Elevation Coastal Zone. Consid-
erable uncertainty exists over the magnitude of possible future sea-level rise, because of poorly understood processes
governing the stability of ice sheets (continental sized glaciers). One such uncertainty is how meltwater interacts with
ice under a warming climate. Understanding of this process is limited by the inaccessibility of the subglacial zone,
which lies beneath 100s to 1000s of m of ice. One approach to address this uncertainty is to investigate areas where
ice sheets have retreated, i.e., where their beds are easily accessible. Eskers are landforms that record the location
and dimensions of former subglacial meltwater channels, and are common in glaciated regions. Recent years have
seen a dramatic increase in the availability of high-resolution Digital Elevation Models (DEMs) of glaciated regions,
providing the opportunity to make detailed measurements of eskers from remotely sensed data. Manual mapping of
these features at the required level of detail is not feasible over the large areas occupied by palaeo-ice sheets (e.g.
most of Canada). We propose an automated method for detecting eskers in hillshaded digital elevation models, based
on Convolutional Neural Networks (CNN). The automated method maps esker locations to facilitate detailed mor-
phometric study of their form. Multiple CNN models are trained and tested via a specially–designed algorithm with
built–in mechanism for selecting an optimal model. Training and testing imagery data were obtained from a test area
in Canada, consisting of 1041 esker positive JPEG files and 37000 esker negative JPEG files. The CNN model perfor-
mance on previously unseen images with and without eskers yields high sensitivity and specificity respectively and
we use the model outputs to elicit esker features from the images. Discussions focus on how timely identifying esker
locations enhance our understanding of why, how, and how fast the sea level rise might happen. We also highlight
the importance of gaining such knowledge in a timely manner within the context of the United Nations Sustainable
Development Goals (SDGs)–particularly SDG #13 and others relating to poverty and food security.
The advent of the Covid-19 pandemic has exposed the need for secure and reliable cyberinfrastructure to support research and education. The reality however is that while selected players in the education sector have made commendable progress in terms of building, maintaining and upgrading such kind of infrastructure, many critical players are still struggling to access and build cyberinfrastructure that can position them as value adding contributors to digital transformation. National Research and Education Networks (NRENs) within the UbuntuNet Alliance region are not spared from this disadvantage. To address this gap, UbuntuNet Alliance, the Regional Research and Education Network of Southern and Eastern Africa has been undertaking various initiatives with various partners including sister RRENs, development partners and NREN members to create an enabling environment for the building and deployment of reliable cyberinfrastructure that can ably support digital transformation. With this integrated approach, the Alliance has managed to provide a platform under which NRENs are building own cyberinfrastructure like cloud infrastructure, learning management systems and web conferencing platforms. The Alliance is also deploying, within its region, tools and services critical for digital transformation. These include eduroam, eduID and engineer capacity-building services. In addition, the Alliance is encouraging strategic partnerships between NREN members and telecom companies. Through such partnerships, NREN members are able to broker zero rating deals, data bundle price reductions for university students and staff and in some cases get into cost-effective sharing of existing infrastructure with the telecom companies.
Keywords: Cyberinfrastructure, Digital Transformation, Research and Education Networks
The 2017-2018 South African listeriosis epidemic was the world's worst outbreak of Listeria monocytogenes food poisoning with more than a 1,000 confirmed cases and an estimated 200 deaths. This outbreak highlighted the vulnerability of the South African population with regards to foodborne diseases and the critical role foodborne disease surveillance plays in public health.
Recently, the World Health Organization encouraged countries to incorporate next-generation sequencing, in particular whole genome sequencing, in their foodborne disease surveillance and response systems. Management of foodborne disease threats requires the swift and correct identification of foodborne pathogens. Whole genome sequencing currently provides the highest possible resolution and strain discrimination for foodborne pathogens with a rapid turnaround time. The data generated by whole genome sequencing enables in silico determination of numerous critical aspects in foodborne disease surveillance such as strain typing, resistance profiling, virulence characterisation and phylogenetic analysis. This negates the often cumbersome and time-consuming traditional typing methods as whole genome sequencing is proving to be an all-encompassing method of foodborne disease surveillance.
Prevention of high-burden foodborne disease outbreaks such as listerioses and salmonellosis requires surveillance across the entire “farm to fork” value chain. This enables the detection of possible epidemiological hotspots and entry points of foodborne pathogens within the value chain.
Located on the Agricultural Research Council's Onderstepoort Veterinary Institute campus, the Biotechnology Platform (ARC-BTP) was established in 2010 as a major strategic priority of the ARC. The role of the ARC-BTP is to create the high-throughput resources and technologies required for applications in genomics, quantitative genetics, marker assisted breeding and bioinformatics within the agricultural sector. The ARC-BTP is currently involved in various foodborne pathogen and disease surveillance research projects and houses the required technologies and capacity to provide whole genome sequencing and bioinformatic analysis for these research projects.
Foodborne disease surveillance research projects requires the collection of numerous samples from various points within the “farm to fork” value chain which leads to the generation of copious amounts of data. Each sample produces roughly 500 MB of raw sequencing data which is then analysed with published and established workflows. Robust foodborne disease surveillance research projects requires the collection and sequencing of a large cohort of samples which produces a wealth of raw sequencing data. Typical workflows for foodborne disease research therefor require access to high-performance computing environments due to the extensive datasets used and the computational and memory intensive applications used in these research endeavours. The CHPC has proved to be a critical partner in foodborne disease surveillance research projects.
In foodborne disease surveillance the adage “Prevention is better than cure” holds true. The prevention of foodborne disease outbreaks is paramount in public health. The COVID-19 pandemic has demonstrated the critical importance of wide-spread and continuous testing. The same applies to foodborne disease surveillance. The prevention of epidemics such as the South African listeriosis outbreak requires large foodborne disease surveillance and foodborne pathogen testing projects to protect public health and ensure that no lives are lost due to contaminated food.
Conjugate coupled physics problems involving multiple materials and domains exist across industry, with various combinations of heat, mass, and momentum transport. Examples include automotive brake cooling, thermal cooling of electronics, and processing of chemical reactive species within packed bed reactors in the oil and gas industry [1]. With the ever-growing demand to produce more efficient, environmentally friendly, durable, and cost effective products, engineers seek to exploit ever more complex simulation capabilities to construct realistic virtual prototypes. Efficient parallel computing plays a crucial role in realising design decision making in a realistic time frame, and is complicated in this instance by the complexity of the multi-region system.
We describe a new framework to model conjugate heat, mass, and momentum transport within a chemically reacting system [2]. This forms part of the HELYX CFD package built on OpenFOAM technology. The multi-region, multi-physics framework is used to simulate the behaviour of a gas-phase packed-bed reactor composed of randomly packed particles within a tube region. Information about interstitial flow phenomena, global and local pressure profiles, and solid species transport phenomena is captured.
We discuss several challenges to performing a CFD analysis of these types of systems, including the creation of randomly packed domains; meshing of these complex structures; capturing the intricate transport phenomena between regions; and scaling of coupled multi-region systems with hundreds of separate domains to high core counts.
References
 1. D.P. Combest. Interstitial-Scale Modelling of Packed-Bed Reactors.  PhD Dissertation. Energy, Environmental, and Chemical Engineering Dept. Washington University in St. Louis. 2012. Accessed January 29, 2020.
 2. O. Oxtoby, E. de Villiers, S. Georgescu. A new Region-Coupled Framework for Conjugate Heat Transfer.  2016 11th OpenFOAM Workshop, Guimarães, Portugal. Accessed January 29, 2020.
 3. D. P. Combest, P. A. Ramachandran.  Micro-Scale Modelling of Packed Beds.  November 2010.  2010 AIChE Annual Conference. Accessed January 29, 2020.
Tsolo Storage Systems is a leading provider of petascale Ceph storage solutions. Through our partnership with the South African Radio Astronomy Observatory, we have delivered some of the largest storage installations in the country.
Our next generation product seeks to democratise the power and cost-benefit of Ceph as an open-source storage solution in order to drastically simplify your storage needs.
Building on top of a custom hardware platform we are able to offer fully managed, user provisioned, S3 storage at a fraction of the price of the incumbent providers.
This talk will highlight the homegrown innovation and technical solutions to local storage problems, as well as the successful public-private partnership with SARAO.
Moving masses of data is a challenge. In most cases networks optimized for business
operations are neither designed for nor capable of supporting the data movement requirements of data intensive research. When scientists attempt to run data intensive applications over these so called “general purpose”/enterprise networks, the result is often poor performance – in many cases poor enough that the science mission is significantly impacted. At its worst this means either not getting the data, getting it too late or resorting to “desperate” measures such as shipping disks around. The South African National Research Network (SANReN) has been piloting a data transfer service with the goal of changing this for our researchers/scientists and optimising the transfer of datasets across the network. The service makes use of data transfer nodes configured in a science DMZ architecture using specially designed data transfer tools to assist to efficiently and securely move data between local institutions, to and from the CHPC and internationally.
This presentation will present an overview of the SANReN Performance Enhancement Response Team’s goals, specifically with regards to the SANReN Data Transfer Pilot service. This includes the science DMZ, data transfer nodes, tools and services implemented. An update will be given on this service on planned way forward and results achieved so far.
Keywords: Performance Enhancement Response Team, Science DMZ, data transfer nodes, optimising data transfer
The global outbreak of SARS-CoV-2 has caused high mortality rate and therefore requires an urgent identification of drugs and other interventions to overcome the disease. Scientist around the world are looking for new drugs or molecules to target the spike protein to prevent Covid-19 infection. We present molecular docking analysis of eight synthetic peptides against SARS-CoV-2 spike protein. Some interacted with the ACE2 while others interacted at the interface of the ACE2 and S protein. These peptides are potential molecules in preventing Covid-19 establishment and can be developed to new drugs.
Abstract:
“The answer to the question “why HEAs exhibit such exceptional properties” lies in their phases” [1]. The implementation of machine learning (ML) approaches for the classification of solid solution high-entropy alloy (HEA) phases is, therefore, a topical theme in material informatics. For this study, we construct a new dataset based at least 430 peer-reviewed experimental publications including at least 40 metallurgy-specific predictor features. This study proposes a systematic framework incorporating of (a) six feature selection schemes, (b) construction of feature ensembles, and (c) the implementation of eight general ML classifiers. The classifiers, namely: regression tree (DT), linear discriminant analysis (LDA), naїve Bayes (NB), generalized linear regression (GLMNET), random forest (RF), artificial neural networks (NNET), k-nearest neighbors (kNN), and support vector machines (SVM) were trained and evaluated on classifying HEA solid solution phases across feature ensemble sizes. Feature selection results identify the most discriminating predictor features and against intuition, the post-treatment heat-treatment features performed poorly. The RF, SVM, kNN, and NNET classifiers outperformed the other algorithms used with accuracy rates of 97.5%, 95.8%, 94.5%, and 94.0%, respectively. Also, we found that our best results are superior to earlier studies on the same datasets. The proposed method can be used in other research areas.
[1] Agarwal, A. & Prasada Rao, A. K. Artificial Intelligence Predicts Body-Centered-Cubic and Face-Centered-Cubic Phases in High-Entropy Alloys. JOM 71, 3424–3432 (2019).
Traditionally, file systems are mostly monolithic, making it hard to experiment with new approaches and technologies. Exchanging core functionality within a file system is a burdensome task, leading to a lack of innovation in this area. However, data volumes are growing rapidly because the ability to capture and produce data is increasing at an exponential rate. Rising core counts and data volumes present challenges for contemporary storage systems, especially regarding metadata performance and data management. This makes it even more important to investigate new ways of storing and managing data efficiently.
HPC storage systems are typically designed around POSIX-compliant parallel distributed file systems that are accessed using sophisticated I/O libraries. The file system and library layers are strictly separated for portability reasons. While this allows exchanging individual layers, their complexities pose a high barrier of entry. This is especially problematic for shorter research projects and presents a significant hurdle for young researchers and students.
Within the JULEA and CoSEMoS projects, we are aiming to change this. JULEA is a flexible storage framework that can be used to prototype new ideas related to storage and file systems. It allows offering arbitrary I/O interfaces to applications and includes interfaces for object, key-value and database storage. The framework has been designed to be easy to set up and run without administrative privileges, so it can be used on a wide range of software and hardware environments. It also serves as the foundation of the CoSEMoS project, which explores the benefits of a coupled storage system for self-describing data formats, such as HDF5 and ADIOS2. This allows the storage system to manage file metadata found within these data formats and makes it possible to use structural information for selecting appropriate storage technologies. For instance, metadata can be stored in database systems that can be queried efficiently. Moreover, making use of established data formats allows running existing applications without modifications, which helps preserve past investments in software development.
CoSEMoS enables novel data management approaches via a data analysis interface that gives applications direct access to JULEA's backends, eliminating the need to sift through large volumes of data to find relevant data points. Breaking up the strict separation has additional long-term benefits, such as being able to take data migration decisions based on structural information found within the file formats.
This talk will briefly introduce the JULEA and CoSEMoS projects, and show the opportunities enabled by their novel storage system design.
Many scientific fields increasingly use high-performance computing (HPC) to process and analyze massive amounts of experimental data while storage systems in today’s HPC environments have to cope with new access patterns. These patterns include many metadata operations, small I/O requests, or randomized file I/O, while general-purpose parallel file systems have been optimized for sequential shared access to large files. Burst buffer file systems create a separate file system that applications can use to store temporary data. They aggregate node-local storage available within the compute nodes or use dedicated SSD clusters and offer a peak bandwidth higher than that of the backend parallel file system without interfering with it. We present GekkoFS, a temporary, highly-scalable file system which has been specifically optimized for the aforementioned use cases. GekkoFS provides relaxed POSIX semantics which only offers features which are actually required by most (not all) applications. GekkoFS is, therefore, able to provide scalable I/O performance and reaches millions of metadata operations already for a small number of nodes, significantly outperforming the capabilities of common parallel file systems.
The Data Science Research Group is engaged in a broad spectrum of data science problems ranging from theory to computations to applications.
The computations and applications in particular required HPC resources. In particular, the implementation of a special neural network referred to as the Error Correction Neural Network (ECNN) was a computation project that required HPC for testing. Application examples include accident detection from traffic videos, and the localization of lesions in Diabetic Retinopathy.
The HPC requirements of these projects included storage for the data and compute resources, both CPU and GPU, for the deep learning models being used.
At the beginning the challenge was the lack of GPUs, but it became more about software packages required by the constantly changing implementations of these deep learning models.
The talk will showcase research results and impact that were made possible by access to CHPC computing resources.
In recent years, considerable progress has been made in using a rational design approach [1] guided by calculations with the Gaussian 09 software package on the Lengau cluster and an application of Michl's perimeter model [1,2] to prepare novel Sn(IV) complexes of porphyrin dyes and porphyrin analogues that are suitable for use as photosensitizer dyes in photodynamic therapy [3-9]. Axial ligation results in low levels of aggregation, while the Sn(IV) ion promotes intersystem crossing resulting in relatively high singlet oxygen quantum yields through a heavy atom effect. Relatively low IC50 values have been obtained during in vitro studies against MCF-7 breast cancer cells. Future directions on the use of the Gaussian 09 software package in the context of this research will be described.
References
[1] J. Mack, Chem. Rev. 2017, 117, 3444-3478.
[2] J. Michl, Tetrahedron 1984, 40, 3845-3934.
[3] B. Babu, E. Amuhaya, D. Oluwole, E. Prinsloo, J. Mack and T. Nyokong, MedChemComm 2019, 10, 41-48.
[4] R. C. Soy, B. Babu, D. O. Oluwole, N. Nwaji, J. Oyim, E. Amuhaya, E. Prinsloo, J. Mack and T. Nyokong, J. Porphyrins Phthalocyanines 2019, 23, 34-45.
[5] B. Babu, E. Prinsloo, J. Mack, T. Nyokong, New J. Chem. 2019, 43, 18805-18812.
[6] S. Dingiswayo, B. Babu, E. Prinsloo, J. Mack, T. Nyokong, J. Porphyrins Phthalocyanines 2020, 24, 1138-1145.
[7] B. Babu, J. Mack, T. Nyokong, Dalton Trans. 2020, 49, 9568-9573.
[8] B. Babu, E. Prinsloo, J. Mack, T. Nyokong, New J. Chem., 2020, 44, 11006-11012.
[9] B. Babu, J. Mack, T. Nyokong, accepted in Dalton Trans. in 2020. doi: 10.1039/D0DT03296D
One goal of support staff at a data center is to identify inefficient jobs and to improve their efficiency.
Therefore, a data center deploys monitoring systems that capture the behavior of the executed jobs.
While it is easy to utilize statistics to rank jobs based on the utilization of computing, storage, and network, it is tricky to find patterns in 100.000 jobs, i.e., is there a class of jobs that aren't performing well.
In this talk, a methodology to rank the similarity of all jobs to a reference job based on their temporal IO behavior is described.
A study is conducted to explore the effectivity of the approach which starts from three reference jobs and investigates related jobs.
The data stems from DKRZ's supercomputer Mistral and includes more than 500.000 jobs that have been executed for more than 6 months of operation.
Simulating Additive Manufacturing (AM) has been difficult because simulation domains can be extremely large and the computational load is minimal. With the way that AM works, only a small part of the simulation domain is required at any time. Stitch-IO offers a way to decompose AM simulations into a series of short runs over short time and space scales and then enables stitching together the output into a lossless, coherent form. The data storage requirement drops dramatically while requiring as little as a laptop and runs the whole simulation in the same wall clock time as if it were on a large supercomputer with the whole simulation domain in RAM. With native Python and C APIs, Stitch-IO offers flexible analysis interfaces as well.
A physics-first approach is followed to model cosmic-ray (CR) modulation from first principles, using a novel time-dependent three-dimensional stochastic solver of the Parker transport equation. This approach places a strong, primary emphasis on understanding the basic causes of cosmic-ray modulation. This requires knowledge and an understanding of both the large scale quantities such 
as the heliospheric magnetic field, heliospheric tilt angle and the solar wind speed, and the small scale quantities such as the magnetic variance and correlation scales. By its very nature, this approach is extremely computationally expensive, and requires high-performance computation on a large scale, such as that made available by the CHPC. The end result is the most realistic solar-cycle dependent three-dimensional cosmic-ray modulation model to date, that is able to self-consistently reproduce the major salient features of the observed cosmic ray intensity temporal profiles. A better understanding of the primary drivers of cosmic-ray modulation, is essential to being able to glean valuable insights into new, fundamental physics in the transport of highly energetic charged particles originating from astrophysical sources.
We have applied machine learning algorithms, including logistic regression (LR), support vector machines (SVM), k-nearest neighbour (KNN) and neural networks (DNN) including convolutional (CNN), recursive (LSTM) and Resnet50 architectures to classify the coughing sounds of tuberculosis (TB) and COVID19 patients.
To do this for TB, we have complied a dataset of cough recordings obtained in a real-world setting from 16 patients confirmed to be suffering from TB and 33 patients that are suffering from a respiratory condition that has been confirmed to not be TB. Among all classifiers considered, we find that best performance is achieved using a LR. In combination with feature selection by sequential forward search (SFS), our best system achieves an area under the ROC curve (AUC) of 0.94 using 23 features selected from a set of 78 high-resolution mel-frequency cepstral coefficients (MFCCs). This system is able to exceed the 90\% sensitivity at 70\% specificity specification considered by the WHO as a minimal requirement for an effective community-based triage test.
For COVID-19, gathering or own data has proved to be very challenging and hence we have developed initial systems using the publicly-available COSWARA dataset (https://coswara.iisc.ac.in/about), which currently includes recordings of the coughs by 1135 healthy and 95 COVID-19 positive patients. As this dataset is highly imbalanced, synthetic minority over-sampling (SMOTE) is applied before training CNN, LSTM and Resnet50 neural architectures. Our best system, which is a Resnet50, has achieved an AUC of 0.96. We would like to apply this system on a locally-compiled dataset. Therefore we are engaged in a data-gathering project (https://coughtest.online) which has so far collected cough sounds from 8 COVID positive and 14 COVID negative participants.
We conclude that, for TB, automatic classification of cough audio sounds is promising as a viable means of low-cost easily-deployable front-line screening, and we are actively pursuing improvements to our system. For COVID-19, cough classification also appears to hold much promise, but more extensive testing on locally-collected data is necessary to obtain more clarity. All classifiers were trained and evaluated using nested cross-validation to make best use of the small datasets for parameter estimation, hyperparameter optimisation and final testing.
This is a computationally extremely expensive process but easily parallelised, and hence the CHPC provided an ideal and key resource for performing this work.
This talk will cover current LANL HPC storage environments and methods, new directions and technologies being explored, and how HPC, AI, and Analytics storage workloads might be serviced by a single flexible storage system a few years from now. Additionally information on a few of the many HPC storage related R&D projects LANL and its partners is working on. Also, information on partnering with LANL to solve important HPC technology related problems in storage and other areas will be presented.
Scalable storage servers consist of multiple parts that communicate asynchronously via queues. There is usually a frontend that queues access requests from storage clients and uses one or more threads to forward queued requests to a backend. The backend queues these forwarded requests and batches them to efficiently use storage devices it manages. Storage servers can have multiple kinds of backends with different design assumptions about their underlying storage device technologies. Requests are scheduled in the frontend to ensure different levels of service for different classes of requests. For example, requests that are generated by data scrubbers working in the background generally have a lower priority than requests from an application. A common solution to the above problem is to move request scheduling from the frontend to the backend. For various reasons that is not always practical. The scope of the proposed project is to have the scheduler reside in the frontend and to explore designs for backends to dynamically control the admission of requests depending on continually changing workloads and storage device technologies.
Scheduling in the frontend and batching in the backend work best if there are enough requests in their respective queues. This raises the question: what is enough for the frontend and for the backend? If there are too few requests in the frontend but more than enough requests in the backend, the system might work well in terms of overall throughput but might poorly enforce scheduling objectives. If there are too few requests in the backend, then overall throughput and latency suffer no matter the scheduling objectives. If, however, the backend has the ability to admit just enough requests from the frontend but not more, throughput and latency of the backend is likely satisfactory. If there is enough work overall, the frontend has enough requests to meet scheduling objectives. How many requests are just enough for the backend?
In this talk I will give an overview of an ongoing research project at the UC Santa Cruz Center for Research in Open Source Software (cross.ucsc.edu) to reframe this question as a bufferbloat mitigation problem using algorithms similar to the ones used for bufferbloat in networking.
At no time in the seven decades-long history of the field of supercomputing has there been a year like this. 2020 has been at the edge of a world-wide disaster with HPC being brought to the battle against Covid-19 in a planetary response unmatched in its history. At this moment of international cooperation in our field to vanquish a pandemic that has killed more than a million innocent victims, the same technology is on the threshold of the unprecedented achievement of 1 Exaflops (Rmax) performance as measured for more than two dozen years by the Linpack benchmark. A capability of a million million million floating point operations per second has stretched the creativity and technologies to its limits as the application of nano-scale semiconductor fabrication processes both will deliver Exascale computing even as it demands revolutionary innovations at the end of Moore’s Law. This presentation, unique in the annals of the CHPC conference due to its virtual form, will examine the dramatic role that HPC is playing in combatting this pandemic as it approaches the era of Exascale and faces the daunting challenges and opportunities to wielding revolutionary concepts, still in their inchoate stages of development, to anticipate the immediate singularity and future directions of supercomputing.







