The aim of the conference: to bring together our users so that their work can be communicated, to include world renowned experts, and to offer a rich programme for students, in the fields of high performance computing, big data, and high speed networking. The CHPC National Conference is co-organised by the CHPC, DIRISA and SANReN.
The CHPC 2023 Conference will be an in-person event with a physical programme hosted at the Nombolo Mdhluli Conference Centre (NMCC), Skukuza.
For more information please see the main conference site.
This year's theme explores how high performance computing has broadened into activities beyond large scale simulation and high throughput processing of experimental data. Now those are joined by machine learning models derived from enormous data sets, with notable successes in pattern recognition to the pattern mimicry of generative models. On the hardware frontier, quantum computing offers exotic processing potential, while commodity enterprise computing brings the flexibility of cloud computing to HPC.
The deadline for abstracts for talks or posters is 23:59 6 November 2023. Proposals for short workshops are also welcome.
Online registration will close on 1 December 2023. Thereafter only onsite registration (at full fees) will be available at the venue.
In this workshop we explore some of the available quantum algorithms designed for data analysis. Specifically, our focus will be in hybrid quantum machine learning, a paradigm integrating classical machine learning models with quantum algorithms. We will also examine techniques for integrating quantum models into pre-existing machine learning workflows, using transfer learning as an example. The hands-on aspect of the workshop will use the Qiskit SDK to implement tutorial examples, providing practical experience with quantum programming.
Agenda:
Introduction to Quantum Computing (45 minutes)
Introduction to Hybrid classical-Quantum machine learning
Overview of quantum algorithms for machine learning (45 minutes)
Introduce strategies for incorporating quantum machine learning algorithms into existing machine learning workflows (30 minutes)
Hands-On quantum programming (1 hr.)
Introduction to Qiskit (15 mins)
Building and running quantum circuits (15 mins)
Building Classical-Quantum models (30 mins)
The workshop targets researchers and students either with a machine learning, data science or data analysis background.
OpenStack has become the de facto standard for on-premise cloud infrastructure. This hands-on workshop will introduce an open project to create private and hybrid cloud infrastructure, and how it can be used to address the requirements of HPC.
Through the pooled collaboration effort of an open community involving many scientific institutions worldwide, many complexities and overheads commonly associated with OpenStack operation are overcome.
The hands-on workshop will provide access to a lab environment where an OpenStack system can be built and the components within it explored and explained.
Familiarity with technologies such as Linux, Docker and Ansible will be helpful (but not essential).
Target audience:
Format: Hands on tutorials
Duration: Full day (divided into 2 half day)
Requirements:
Today, a deep comprehension of the interplay between chemical reactions and physical processes is achievable through rigorous atomistic-scale simulation techniques. The Materials Design's MedeA® simulation environment [1] efficiently supports material scientists and engineers globally by offering an integrated suite of user-friendly tools. These include the leading quantum chemical and molecular mechanics solver Vienna Ab-Initio Simulation Package (VASP) [2], the forcefield-based molecular dynamics and Monte Carlo engines LAMMPS [3] and GIBBS [4], and the quantum chemistry codes Gaussian [5] and MOPAC [6]. MedeA®'s property prediction modules efficiently use these engines to compute a wide range of material properties.
To reach across length and time scales up to the mesoscale and microscale, and to explore an expansive configuration space, MedeA offers versatile multi-scale methodologies. These encompass Universal Cluster Expansion, coarse-grain potentials, and automated machine-learned potential generation.
Program
1st session: Using MedeA to perform ab initio VASP calculations
2nd session: Using forcefields in MedeA to perform LAMMPS simulations
3rd session: Machine Learning: Bridging the length scales from ab initio to forcefields
About Materials Design:
Materials Design is the leading atomistic simulation software and services company for materials. We help customers across many diverse industries design new materials, predict their properties, and generate value through innovation.
References
[1] MedeA® - Materials Exploration and Design Analysis Software, Materials Design, Inc., Angel Fire, NM, USA 1998-2014.
[2] Vienna Ab-Initio Simulation Package (VASP), G. Kresse and J. Furthmüller, Phys. Rev. B 54, 11169 (1996).
[3] Large-scale Atomic/Molecular Massively Parallel Simulator (LAMMPS), S. Plimpton. J. Comp. Phys., 117, 1 (1995).
[4] MedeA®-Gibbs: License IFPEN-CNRS-Université Paris-Sud (2003).
[5] Gaussian 09, Revision A.02, M. J. Frisch, et al., Gaussian, Inc., Wallingford CT, 2016.
[6] Molecular Orbital PACkage (MOPAC), J. J. P. Stewart, J. Mol. Model. 13, 1173 (2007); J. Mol. Model.19, 1 (2013).
This one-day tutorial on using Materials Studio is designed to provide participants with comprehensive hands-on experience in materials modeling and simulation. The tutorial will focus on several key aspects of Materials Studio, including setting up calculations in CASTEP and DMOL3, utilizing the adsorption locator, constructing surfaces, generating supercells and layers, and effectively interpreting simulation results.
The tutorial will begin with an introduction to the software environment and its capabilities. Participants will then delve into the intricacies of setting up calculations in CASTEP and DMOL3, two powerful tools for materials modeling. Subsequently, the tutorial will explore the adsorption locator feature, enabling participants to study the adsorption of molecules on surfaces.
A significant portion of the tutorial will be dedicated to building surfaces, understanding supercell generation, and constructing layers within the Materials Studio environment. Participants will gain practical skills in manipulating structures and simulating materials' behaviors under various conditions.
The tutorial will conclude with an emphasis on result interpretation, equipping participants with the ability to extract valuable insights from their simulations. Throughout the day, practical exercises and examples will be provided to enhance learning and ensure that participants leave with a strong foundation in using Materials Studio for materials research.
This tutorial is ideal for researchers, scientists, and students interested in computational materials science and seeking to harness the capabilities of Materials Studio for their work.
Walter Riviera
AI Center of Excellence — EMEA
Session 1: Hardware
For IT/AI Engineers, hardware specialists and technical decision-makers.
AMX on Xeon:
GPUs:
Habana:
Q&A
Refreshment break
Session 2: Software
For Data Scientists, AI Engineers, and Technical Decision-makers.
AI software optimizations:
LLM and Generative AI:
AI Career development
For AI enthusiasts and students.
Walk-through on how to develop a career in AI.
TBC
TBC
TBC
TBC
TBC
This research and development project led by Marc Sherratt, Sustainability Architects (MSSA) and the Rory Hensman Conservation and Research Unit (RHCRU) restores the extinct ability of the African Savanna Elephant (Loxodonta africana) to migrate across the Limpopo Province of South Africa. This approximately 1000 km wildlife migration corridor links existing, fenced conservation areas that already house over-populated herds of African Elephant. This route uses an elephant’s sophisticated infrasonic communication as a method to “call” the animal along this route. This project has been designed to support rural communities by increasing food security and economic resilience while at the same time reversing global warming. South Africa has an overpopulation of the IUCN classified, Endangered, African Savanna Elephant (Loxodonta Africana) within its large, fenced conservation areas. This leads to unnatural population control including culling (as seen in Kruger National Park) and contraception (as seen in Addo Elephant National Park) of an endangered species. However these areas are usually surrounded by smaller reserves that can accommodate temporary elephant movement, if managed correctly. The tested solution presented allows for the return to the large-scale act of seasonal wildlife movement between grazing lands but now along man-made wildlife corridors that utilize smaller parcels of mainly private land. This system allows for mixed-use land use including cattle farming, wildlife breeding, tourism and staple crop farming, allowing only elephants to move but keeping other high value game / livestock secured. The proof of concept has been implemented in the Limpopo province of South Africa and consists of an Artificial Intelligence (AI) driven, automatic gate system and an infrasonic elephant communication tower (sounding tower). In combination this system allows wild elephants to transverse between electrified wildlife and farming land without direct human interference using a uniquely developed, ecologically sensitive, infrasonic “language”. This project harnesses the collective intelligence of a diversely skilled professional team, from musicians to AI specialists, from zoologists to engineers. The long term vision for this project would be to connect both private and public land with wildlife corridors that could allow elephant movement in a fully connected, provincially scaled, adequately protected migration route.
TBC
The South African Population Research Infrastructure Network (SAPRIN) is a prominent institution within South Africa dedicated to enhancing and disseminating high-quality population-based research. SAPRIN’s central goal is to provide insights into the country's demographic, health, and social structures and trends, thereby aiding evidence-based decision-making. To bolster its research capabilities and provide streamlined access to crucial data for researchers, SAPRIN intends to launch a cutting-edge data science platform. This innovative platform transcends the limitations of traditional data repositories, integrating capabilities that empower data users to engage with the shared SAPRIN data interactively, utilising illustrative workflows and tutorials. A prime example of this is the incorporation of Jupyter Notebooks, a popular tool among data scientists and researchers, which facilitates the creation and dissemination of documents featuring live code, equations, visualisations, and explanatory text. the proposed project holds the potential to transform the landscape of population health research in South Africa, driving it towards a more data-driven, efficient, and responsive domain.
Biodiversity seems to be under threat from a variety of sources. Traditionally urbanisation and agricultural expansion have claimed most of the land. More recently climate change has held the limelight. But poaching, particularly of plants, is an issue that has become dire within the last few years. Technology has changed the way field-research is undertaken, providing localities more accurately and quickly than ever before. Citizen Science platforms have been at the forefront of this push to obtain knowledge of our plants, both what they look like and where they occur. This is critical for making observations available for conservation, education and research. However, they have also provided a gateway for poachers who can readily obtain precise localities of plants. It's critical that a safer method of sharing observations is discovered. CASABIO has stepped up to this call by working on their next version of their citizen science platform. Hosted at the CSIR as part of the DIRISA ecosystem, the new platform is a radical restructuring of their old system. It introduces several new aspects that make it more efficient and able to handle an increasing diversity of media. We'll be presenting these advantages that aim to provide a new foundation for future Citizen Science.
TBC
TBC
Questions
TBC
I will briefly discuss the history of how scientific focus areas are coordinated through the use of the national development plan, aligned scientific white papers, and the decadal plans that are built from the resulting insight. I will also discuss the current academic landscape of quantum computing in South Africa.
I will discuss the CSIR and how the clusters within the organisation align with the decadal plan. Two units in particular are involved in computing research and are therefore aligned with quantum computing research, while other units within the CSIR offer unique insights and potential to explore the applications of quantum computing. Researchers within the Centre for High Performance Computing (CHPC), and the Nextgen enterprises and institutions units within the CSIR are working on making quantum computing research available and applicable to South Africans. The two units are working with each other, other suitable clusters within the CSIR, and quantum-orientated academic institutions, and industry.
The quantum computing community in South Africa is too small to work in isolation and working together in a coordinated manner is essential. If we work together we will find applicable uses of quantum computing as applied to local problems that are unique to South Africans. There are many examples of uniquely South African problems applicable to quantum computers that would be ignored by international scientists. As just one example, applying quantum computing to optimisation problems, such as finding, predicting, and treating strains of HIV that are only found in South Africa.
The universities are providing additional quantum computing research topics, and academic groups such as the quantum computing working group also contribute white papers that guide the next round of policy making. Both the universities and the CSIR provide the DSI with indications on how to apply the focus areas as suggested in the decadal plan.
Aligning quantum computing research with the focus areas within the decadal plan will help fund and coordinate carefully pre-identified critical research that helps South Africa with identified problems and potential threats.
Recent advancements in the interdisciplinary realms of machine learning (ML) and quantum computing (QC) have paved the way for innovative approaches in biophotonics, an established field that utilizes light-based technologies to probe biological substances. Quantum machine learning (QML), an emerging frontier, amalgamates quantum computing's superior processing capabilities with machine learning's predictive power, offering unprecedented opportunities in biophotonics applications ranging from medical diagnostics to cellular microscopy. This talk explores the symbiotic integration of ML, QC, and QML within the context of biophotonics. We begin by providing a foundational overview of machine learning algorithms, emphasizing their application in image and signal processing tasks common in biophotonics, such as feature extraction from complex biological datasets and pattern recognition in biomolecular structures. We then delve into the quantum computing paradigm, elucidating how its intrinsic properties — such as superposition and entanglement — can dramatically accelerate computational tasks pertinent to biophotonics. The crux of our discussion centers on quantum machine learning, where we dissect how QML algorithms harness quantum states to perform data encoding, processing, and learning at a scale and speed beyond the reach of classical computers. We present a critical analysis of the current state of QML, highlighting how its implementation could revolutionize biophotonics by enabling the analysis of voluminous and high-dimensional datasets more efficiently, thereby facilitating real-time monitoring and decision-making in clinical settings. To illustrate the practical implications of QML in biophotonics, we showcase cutting-edge applications, such as the quantum-enhanced detection of biophotonic signals, the optimization of biophotonic setups, and the quantum-assisted imaging systems that provide super-resolved images. The challenges of integrating QML in biophotonics are also discussed, including the current technological limitations of quantum hardware and the need for specialized quantum algorithms tailored to biophotonic data. We conclude by forecasting the future directions of QML in biophotonics, contemplating the potential breakthroughs and transformative impacts on healthcare, biological research, and beyond. Our synthesis not only underscores the transformative potential of QML in biophotonics but also calls for a concerted effort to overcome existing barriers, thus charting a course towards a quantum-enhanced era in biological science and medicine.
Exascale computing is coming and given the large anticipated power consumption it is prudent to first ensure both the users and the software are exascale ready before investing in the hardware. The Excalibur Project is UK's response to this challenge, which funds a range of hardware and software projects and train the next generation of Research Software Engineers. One of these funded Excalibur projects is called QEVEC, which seeks to determine whether quantum computers could potentially be employed as accelerators for classical HPC. Part of the QEVEC project has targeted the use of D-wave annealers (quantum computers) to tackle problems, in the field of computational chemistry and materials science, that are intractable on classical computers.
In this talk, I will show how the relative energy of defective graphene structures can be calculated by using a quantum annealer. This simple system is used to guide the audience through the steps needed to translate a chemical structure (a set of atoms) and energy model to a representation that can be implemented on quantum annealers (a set of qubits). I discuss in detail how different energy contributions can be included in the model and what their effect is on the final result. The code used to run the simulation on D-Wave quantum annealers is made available as a Jupyter Notebook - more details can be found in our recent publication. The first part of this talk is designed to be a quick-start guide for the computational chemists interested in running their first quantum annealing simulations. The methodology outlined in this talk represents the foundation for simulating more complex systems, such as solid solutions and disordered systems, which I will go on to discuss and show latest results for three different solid solutions, to demonstrate the versatility of our developed method. Each system has interesting technological applications: N-doped graphene in catalysis and energy materials, Al$_{\delta}$Ga$_{1-\delta}$N in optoelectronics and Mo$_\delta$W$_{1-\delta}$ used as structural components in nuclear and rocket systems because of their high high-temperature strength, high melting point, and good corrosion resistance. Time permitting, I will also present an overview of the Excalibur PAX-HPC project.
Questions
DIPLOMICS (DIstributed PLatform in OMICS) is a research infrastructure established by the Department of Science and Innovation through its South African Research Infrastructure Roadmap (SARIR). “OMICS”, the study of a collection of biological molecules, e.g. genomics, proteomics and metabolomics, often involves high-throughput, large-scale data generation. Our objectives are to enable and strengthen Omics capacity, improve the quality of research and increase access to Omics technology and expertise. With our network partner labs, we pursue our goals by developing programmes as vehicles for training and method development. For instance, DIPLOMICS is in the pilot phase of one of these high-impact, high-visibility programmes – 1KSA: Decoding South Africa’s Biodiversity – which is using Oxford Nanopore Technology to sequence the genomes of over 1000 South African species important to biodiversity and conservation. CLARITY, a bioinformatics marketplace made possible by DIPLOMICS, will assist with developing workflows to assemble the genome of each of these species. During the pilot phase of 1KSA, we explore the challenges and feasibility of large-scale data transfer, analysis and storage in the country. NICIS has a significant role in making this a success and has provided guidance and support thus far. 1KSA aims to ultimately generate a national resource of genomic data for South African scientists. The lessons learned, solutions created, and skills gained will benefit other large-scale genomic initiatives in South Africa.
The increasing adoption of cloud technologies in research cyber-infrastructures, including commercial clouds (Azure, GCP, AWS), research-oriented clouds (CHPC’s Sebowa cloud and Ilifu cloud facility), and private cloud environments, has enabled greater flexibility and scalability in the industry and academia.. We present the use of Kubernetes and Cloud resources in the ongoing African Pathogen Data Sharing and Archive project, a multi-country effort to develop a mechanism for sharing pathogen genomic data among public health laboratories. Through this use case, we demonstrate how effectively utilizing cloud resources necessitates a paradigm shift in viewing infrastructure as an integral part of the application, rather than a separate entity. Emphasizing the significance of API-driven software-defined infrastructure over traditional methods like creating virtual machines and networks, we present lessons learned that challenge the conventional approach to managing research cyber-infrastructures.
TBC
The ALICE experiment at CERN is one of the four big detectors on the LHC.
ALICE's main focus is the Pb on Pb collisions, which have just been completed at the end of October 2023. We shall describe how computing works. How the data is processed for Run 3 coming off the detector, how it eventually gets analysed, and the part that CHPC plays in this computing. We shall also describe the benefits of the upcoming CHPC membership to the ALICE experiment.
Questions
Cosmology, the understanding of the evolution of entire Universe, has progressed very fast in the past several decades with the advances of modern telescopes. I will give a brief overview of the modern cosmology of the last century and highlight its phenomenal successes and distinctive challenges. I will then explain how and why the use of Machine Learning technique can help unravel these mysteries and exploit the uncharted territories of the early Universe. In particular, I will give several cases of using machine learning to constrain primordial non-Gaussian fluctuations in the cosmic microwave background radiation, and studying the epoch of reionization.
The evolution of data center (DC) design over the years has been pivotal in shaping the digital landscape, particularly with the rise of Web 2.0 and now, the advent of Generative AI. Initially, the growth of Web 2.0 was underpinned by technologies that focused on increasing data storage capacities and improving network bandwidth to support burgeoning internet services and cloud computing. As we step into the era of Generative AI, the demands on DCs have dramatically escalated, requiring not just enhanced storage and connectivity, but also unprecedented computational power and energy efficiency. The new DC designs are poised to future-proof themselves by adopting cutting-edge technologies like advanced cooling systems, AI-driven automation for operational efficiency, and modular designs for scalable expansion.
The Deep Learning Indaba was founded in 2017 as a community for growing Artificial Intelligence capability and knowledge sharing across Africa. To extend the impact of the community, "Deep Learning Indaba𝕏" events have been launched in multiple participating nations. A Deep Learning Indaba𝕏, or "Indaba𝕏," represents a locally-organised conference aimed at democratizing machine learning expertise and capability across the African continent. As of 2023, these Indaba𝕏 events have been conducted in 36 African countries.
The South African Deep Learning Indaba𝕏 ("Indaba𝕏 ZA") first took place in 2018 and has since united students, researchers, and industry practitioners in a collaborative atmosphere. Indaba𝕏 ZA serves as a platform for attendees to meet and engage with grassroots communities and small businesses. Indaba𝕏 ZA is a volunteer-driven event and would not be possible without support from partners including the CHPC, NiTheCS and the DSI-NRF Centre of Excellence in Mathematical and Statistical Science (CoE-MaSS).
One of the highlights of the Indaba𝕏 ZA is the hackathon, where concerted efforts have been made over time to encourage more active student participation in the event. Complementing this is a dedicated "fundamentals" track spanning 3 days, with the aim of cultivating foundational skills and lowering the barrier to entry for new members of the community. Furthermore, initial strides have been taken towards incorporating practical, hands-on training into the programme.
This talk will provide a review of the Indaba𝕏 ZA journey thus far, shedding light on its impact, community diversity and emerging trends within the field. Moreover, we will touch on our shared vision with partners like the CHPC and how we can explore avenues for deeper collaboration. For example, there is a great opportunity to provide a platform for the CHPC to introduce participants to compute and training resources. Our mutual goal remains the upliftment of young minds in the realm of AI research, paving the way for a future generation of skilled and empowered individuals across the country and the continent.
The Deep Learning Indaba, founded in 2017, serves as a community dedicated to fostering Artificial Intelligence capabilities and promoting knowledge sharing across Africa. As part of this endeavor, Deep Learning Indaba𝕏 events have been established in several participating countries. A Deep Learning Indaba𝕏 represents a locally-organized "Indaba" or conference aimed at ensuring the widespread dissemination of knowledge and capacity in machine learning across the African continent. As of 2023, Indaba𝕏 events have taken place in 36 African countries.
The Deep Learning Indaba and Indaba𝕏 have provided an invaluable platform for the emergence and growth of various research communities with shared interests. Grassroots communities like Masakhane, Ro'ya, and Sisonke Biotik have cultivated networks of localized expertise in AI applications for language, computer vision, and healthcare respectively. These organisations prioritise the "community first" principle, valuing it even above objectives like research publication. As the African proverb wisely suggests, "If you want to go fast, go alone; if you want to go far, go together." These communities embody the essence of African AI development, and their impact stands to be enhanced through the utilization of High-Performance Computing (HPC).
The global AI revolution has spurred the creation of numerous startups. The Deep Learning Indaba platforms have been instrumental in showcasing African AI startups and creating opportunities for talent acquisition and business networking. While African AI-focused companies like InstaDeep have firmly established themselves in the private sector, the potential for further AI industry development in Africa remains substantial. Lelapa AI serves as another recent example of a South African AI company that has generated significant attention and anticipation. Support from HPC for startups holds the potential to facilitate their transition from the early stages of growth and experimentation to achieving sustainable expansion at a faster pace.
During this talk, we will spotlight success stories, share published research from these communities, and delve into the challenges they have encountered. Finally, we will emphasize how the convergence of grassroots efforts, small businesses, strategic partnerships, and the untapped potential of HPC can collectively ignite the next wave of African innovation – an innovation "for Africans, by Africans."
Large-scale training of Deep Learning Models (DL) in High Performance Computing(HPC) systems has become increasingly common to achieve faster training time for larger models and datasets by alleviating memory constraints. Training DL models in these systems cuts weeks or even months of training to mere hours and facilitates faster prototyping and research in DL. Importantly, training some of the larger models is only possible through these large-scale machines. This talk will provide participants with a foundational understanding of the concepts and techniques involved in Deep Learning in HPC as well as challenges and opportunities for research in the area.
Questions
TBC
Please join an informative and interesting session facilitated by Mr Dan Olds (Intersect360) where representatives from the conference sponsors will be talking about their technology offerings.
This will be an engaging and entertaining session for delegates to attend.
The representatives of the conference vendors are as follows:
Intel: Mr Ahmed Al-jeshi
HPE: Dr Utz-Uwe Haus
Dell Technologies: Mr Ryan Rautenbach
Nvidia: Mr Claudio Polla
Supermicro: Mr Roger Crighton
VAST: Mr Scott Howard
Spectra Logic: Mr Miguel Castro
Huawei: Mr Mpolokeng Marakalla
Two long-held aspirations, "agile supercomputing" and "performant cloud”, are converging. Supercomputing performance can now be achieved in a cloud-native context. Cloud flexibility ensures different classes of workload may be targeted at the most effective resources. Innovation in Artificial Intelligence (AI) further drives the pressure for change.
Open source solutions provide a cost-effective path to achieving the software-defined supercomputer. This presentation will present the latest work on using open infrastructure to implement a high-performance AI-centric cloud.
The presentation will introduce the core components of a software-defined supercomputer, and provide technical insights into their operation. Real-world experience will be used to describe the challenges and how they are overcome.
TBC
Masters and Doctoral students must be at their posters to answer questions during this break.
While plant genome analysis is gaining speed worldwide, few plant genomes have been sequenced and analyzed on the African continent. Yet, this information holds the potential to transform diverse industries: it unlocks medicinally and industrially relevant biosynthesis pathways for bioprospecting and can boost innovation of plant breeding and plant protection strategies. Considering that South Africa is home to the highly diverse Cape Floristic Region, local establishment of methods for plant genome analysis is essential. The Medicinal Plant Genomics Program was initiated at UWC in 2016 with the sequencing of the diverse transcriptomes and the genome of rooibos (Aspalathus linearis); an endemic South African medicinal plant species commonly known as a beverage – rooibos tea. Here, I provide insight into the computational requirements essential for the analysis of this relatively large eukaryotic genome (1.2 Gbp). Biocomputational data analysis, spanning base calling and quality filtering of the raw data (≈2.5 Tb), genome and transcriptome assembly, and subsequent structural and functional genome annotation, was completed locally at CHPC in Cape Town.
The depletion of fossil fuels and rapid growth in world population are the main drivers of research interests to find alternative renewable energy sources that could alleviate the global energy crisis. Hence, perovskite solar cells have been largely explored as a prospective source of clean and renewable energy. They have shown remarkable progress with rapid increases in power conversion efficiency, from early reports of approximately 3% in 2009 to over 25% today. Despite their excellent optoelectronic characteristics such as tuneable band gap, high absorption coefficients, high carrier mobility, and long diffusion lengths for electrons and holes, small effective masses and facile fabrication; they still have a number of drawbacks that hinder their practical application and commercialisation. Perovskite solar cell devices must retain high efficiencies while exhibiting decent stability and acceptable degradation for practical applications. Herein, using first-principle approach we explore different engineering strategies for various perovskites materials, namely, all-inorganic halide perovskites, organic-inorganic perovskites and double perovskites crystal structures and their respective optoelectronic characteristics. In addition, data-driven machine learning approach is used to conduct compositional space exploration to discover new perovskite materials.
The South African Weather Service (SAWS) has employed the computing resources of the Centre for High Performance Computing (CHPC) for several research projects. The current project is primarily focused on model development, utilizing the Conformal Cubic Atmospheric Model (CCAM) provided by the Commonwealth Scientific and Industrial Research Organisation (CSIRO). The model was successfully installed on the CHPC cluster, and a series of experimental simulations were conducted at various grid resolutions. The primary objective was to gain insights into the model's scale-awareness and identify areas where improvements could enhance its ability to simulate high-impact weather events. The chosen grid resolutions included 25 km, 10 km, 6 km, 3 km, and 1 km. For each case study,the model simulations were run around the area of observed high-impact weather. As a result, the high-resolution simulations spanned a relatively small geographical area. This paper details the procedures employed to execute these diverse CCAM simulations, the computational resources utilized, and the model's performance.
This research presents DFT and TD-DFT calculations for eight 1,2,4-triazole compounds (A1-A8) that were theoretically evaluated as organic dyes in dye sensitized solar cells (DSSCs). The parameters used in this evaluation included, oscillator strengths, electron diffusion constants, electron injection efficiencies, electron collection efficiencies, highest occupied molecular orbitals (HOMO), lowest unoccupied molecular orbitals (LUMO), amongst others. These parameters play a significant role in determining the efficiency of the dye as rapidly diffusing electrons will be more readily available for electron injection into the conduction band of the semi-conductor where they can participate in the current flow and be regenerated back into the dye via an electrolyte provided that the HOMO of this electrolyte lies at a higher energy level than the HOMO of the dye. Moreover, the LUMO of the dye should also lie at a higher energy level than the LUMO of the conduction band of the semi-conductor. More rapid diffusion can be facilitated by conjugated systems that consists of donor, linker (π-spacer) and acceptor fragments where electrons are localized across the donor and being delocalized towards the acceptor via the linker. In this study, starburst and alkoxy phenyl groups acted as the donors, while the 1,2,4-triazole groups acted as the linker, and cyano acrylic acid acted as the acceptor group. Since the acceptor group must adhere to the semi-conductor for efficient electron injection, it is important that as much electrons as possible reaches this group. From this study, it was discovered that A2 was the most efficient organic dye.
Questions
Welcome and introduction to the ISSA/SANReN track.
Terrestrial model (TerM) is the land surface scheme developed jointly at the Institute of Numerical Mathematics RAS and Moscow State University. It has been originally a part of INM-CM Easth system model and SL-AV weather forecasting system, and is responsible for providing fluxes of radiation, heat, moisture and greenhouse gases to the atmosphere from the land surface. TerM uses multilayer soil, snow and lake models, vegetation controls on evaporation and energy exchange, terrestrial carbon and methane cycles. TerM is currently implemented also in a standalone mode, enabling more flexibility in land surface research. The standalone TerM includes advanced river routing scheme, and can be used in single-column, regional and global domains of arbitrary longitude-latitude regular mesh, forced by meteorological observations, reanalysis, or climate models data. It is supplemented with preprocessing system supplying external data on land cover types, soil, lakes, rivers, etc. To increase the model performance, an automatic calibration system is developed. The model is implemented for multicore systems using MPI+OpenMP technologies. We present examples of the TerM application for hydrological and carbon cycle studies.
This study presents the development of a three-dimensional unstructured adaptive finite-element model (Fluidity-Atmosphere) for atmospheric research. To improve the computational efficiency, a LSTM-based three-dimensional unstructured mesh generator is proposed to predict the evolution of the adaptive mesh. To evaluate the performance of adaptive meshes and physical parameterisations in Fluidity-Atmosphere, a series of idealized test cases have been setup and the unstructured tetrahedral meshes are adapted automatically with the specified fields in time and space.
Extreme events of rainfall in Brazil are observing significant increase in the frequency and in the strength of their occurrence. Recent events in the southeast part of the country led to casualties, property damage, and huge impact in the cities and their urban lives. Governments are installing alarms in endangered parts of the towns, preparing evacuation instructions and event deallocating people trying to avoid human loss in the next known-come large event.
In other to support these initiatives, we are developing different projects aiming at constructing predictive models to forecast the occurrence of strong rainfall. The Rionowcast project is being carried on in a collaboration between academic institutions in the Rio de Janeiro state and the Operation Center of Rio de Janeiro (COR). The idea is to build AI spatio-temporal models using a variety of data sources providing historical and real-time information about the weather conditions in Rio de Janeiro. Data sources include: rain gauges, weather stations, radio-sonda; ocean buoys; satellite products, radar products and numerical models. We are trying with different DL model architectures from transformers to GNN; from global to local models and ensembles; and physical informed networks.
In order to foster the collaboration among the different research groups, we are using the Gypscie framework that supports data and model management and dataflow execution.
During the Digital Earth Session of the CHPC National Conference, we intend to briefly present theses initiatives
Pune is the second largest city in the Indian state of Maharashtra, situated over a complex topographical region on the leeward side of the Western Ghats, India. Recently, Pune City has been experiencing frequent heavy to extreme rainfall events, causing urban floods, threatening lives, and heavy socio-economic damage. The recent decade has witnessed the adverse effects of urban foods on daily life by destroying infrastructure, water-logging that triggers foods, disrupting transportation, and resulting in the loss of lives and property. An efficient early warning system is a crucial requirement that remains challenging using a high-resolution numerical weather prediction (NWP) model. The complexity increases manifold, particularly if the forecast has to be made on an urban scale to mitigate its adverse impacts.
An attempt is made to develop a coupled modelling system that integrates the Weather Research and Forecasting (WRF) model with Hydrological to enhance urban flood forecasting capabilities for an Indian city. Extensive work has been done to set up the WRF model through sensitivity analysis of domain setup, parameterisation schemes, land-use information, and initial conditions for rainfall event forecasting over Pune. Model performance has been validated against various observations available through ground-based and satellite measurements. The rainfall forecast obtained from the WRF model at a very high resolution of 0.5 km has been provided to the hydrology model to simulate surface runoff, stormwater discharge, and depth in urban regions. The developed coupled system was calibrated against past rainfall flood events over Pune. This calibration ensured that the model represented the actual behaviour of the system and the rainfall distribution in the sub-catchments. This coupled system was used for simulations of recent floods of 2022 and showed a good agreement with observations. Such coupling of hydro-met systems can be a helpful tool to enhance urban flood forecasting. For this work, WRF model simulations were performed on HPC (PARAM series) using around 1900 processors.
We discuss the development of the unified framework for the numerical simulation of the atmospheric boundary layer turbulence. The model developed at the Lomonosov Moscow State University combines DNS (Direct Numerical Simulation), LES (Large-Eddy Simulation) and RANS (Reynolds-Averaged Navier-Stokes) approaches for turbulence modelling and allows high-resolution simulations on HPC systems by using MPI, OpenMP and CUDA. The code is structured in a such way as to separate the solution of high-level “numerical” and “physical” problems from the code related to parallelization or low-level algorithm optimization highly dependent on the computational architecture. The principal advantage of such separation is the ability to tune the code for different architectures without modifying the high-level and problem specific part of the code. The efficiency of the model implementation and the challenges of using heterogeneous architecture of modern HPC are discussed. A particular emphasis is placed on the code optimizations relevant for problems of aerosol and chemistry transport in urban environment. We show how the DNS- and LES- simulations may be used to improve current boundary-layer processes parameterizations used in Earth system models.
Algoa Bay is situated at the edge of the Agulhas Current, where it transitions from being relatively stable, to unstable as the continental shelf broadens in the downstream direction. As one of South Africa’s largest bays it provides a degree of shelter from the southern hemisphere’s most powerful western boundary current and is being utilized for offshore ship refueling operations. The environmental risks involved, the highly dynamic offshore boundary and the good network of measurements in the bay have led to it being identified as a pilot site for the development of an operational forecast system that would support stakeholders and decision makers in the case of coastal hazards. To this end, a step by step approach was followed in order to produce a downscaled forecast system optimized for this region and that can be readily configured for other key locations around the coastline. The first step was to evaluate and intercompare various global models as potential boundary conditions. The next step was to develop high-resolution, limited duration hindcast CROCO/ROMS simulations, using different ocean boundary forcings and resolution atmospheric products. Comparisons with temperature recorders and ADCPs at various locations within the bay reveal the differences in the skill of the different models and that their ensemble mean performs best. The tools for the modelling approach have been ‘dockerized’ for the ease of implementation and interoperability of the system. Using this dockerized workflow, a second bay-scale operational forecast system has been implemented for the South West Cape Coast region, which is home to a lucrative aquaculture industry that are periodically impacted by severe harmful algal blooms (HABs). These limited area forecast systems are being incorporated into a tool to initialize operational OpenDrift particle tracking simulations with various site-specific applications (e.g. oil spills, search and rescue and HAB advection). The operational system will be integrated into the National Oceans and Coastal Information Management Systems (OCIMS) in support of various decision support tools which promote good governance of the coastal environment.
TBC
Effective use of high performance computing resources in computational chemistry and materials science
Krishna K Govender1,2*
1Computational Chemistry and Molecular Modelling Group, Department of Chemical Sciences, University of Johannesburg, PO Box 17011, Doornfontein, 2028, Johannesburg, South Africa
2National Institute for Theoretical and Computational Sciences (NITheCS), South Africa
*Corresponding author email: krishnag@uj.ac.za
ABSTRACT
When we say “Eskom” the first thing that comes to mind is loadshedding. It was suspended for a while, but before we knew it the constant power cuts were back, and this results in the loss of access to various nodes on the Lengau cluster. In addition, several nodes on the cluster are down due to their age. These factors together with the misuse of the resources by inexperienced users has resulted in individuals having to queue for elongated periods of time.
In this work focus will be placed on research conducted within the Computational Chemistry and Molecular Modelling (CCMM) Group at the University of Johannesburg and how to determine if simulations run with different software packages are making appropriate use of the resources being requested.
An upgrade of the current Centre for High Performance Computing (CHPC) cluster is eminent, but with more resources comes more misuse and it is hoped that this work will help shed light to HPC users and ensure that everyone makes use of the resources being provided fairly and responsibly.
Keywords: CCMM, HPC
Our modern industrialised societies are critically dependent on a variety of metals from iron and steel through to technology materials like silicon, copper, and aluminium. The vast majority of these commodities are sourced from metallurgical smelting furnaces of various designs, in which primary or secondary raw materials are converted into the molten state at very high temperatures in order to perform the physical and chemical separation processes necessary for the product of value.
Most smelting furnaces operate in a semi-batch mode, in which raw materials are fed to the furnace continuously but the process products and wastes are removed only at discrete intervals. This removal of the molten materials from the furnace is done using a procedure called tapping – a channel (the tap-hole) is opened in a specialised part of the furnace wall, and the liquid contents are allowed to drain out under the action of gravity and any additional pressure in the vessel. Once sufficient material is drained, the tap-hole is resealed and the process continues. During tapping, human operators and equipment are exposed to molten alloy and slag materials at temperatures in excess of 1500°C. This harsh environment makes any variability or unpredictability in the tapping process potentially hazardous, and at the same time greatly limits the applicability of standard measurement and control instruments. There is therefore considerable value in using computational, numerical, and data-driven modelling tools to provide in silico insight with regard to the design and operation of furnace tapping systems.
This presentation will document Mintek’s work over the past few years in developing a diverse software ecosystem for the study of furnace tapping problems, ranging from high-fidelity computational fluid mechanics models through to reduced-order modelling and data-driven machine learning approaches. The common thread of high performance computing as an enabling technology weaves through this story, and is seen to add value in a number of expected and unexpected ways.
Research Software Engineers (RSEs) support researchers in generating efficient, correct and reproducible research, and in promoting the development of sustainable (and re-usable) software for research. This talk will introduce the concept of Research Software Engineering (RSE) Groups as an emergent outcome of the decade-long history of the RSE movement that originated in the UK; highlighting how such teams of RSEs are able to support researchers in their host institutions, including in promoting the use of in-house HPC facilities. It will also discuss the findings of the 'RSE Roadtrip' - an ambitious study of examples of RSE Groups at multiple UK universities. The aim of the 'RSE Roadtrip' is in understanding how RSE Group diversity (in terms of organisational context and other structural and functional features) affects member RSEs and group effectiveness, and in suggesting best practices for the formation and sustainability of RSE Groups (including inspiration for new approaches within the South African context).
Questions
Deep learning training (DLT) applications exhibit unique I/O workload behaviors that pose new challenges for storage system design. DLT is I/O intensive since data samples need to be fetched continuously from a remote storage. Accelerators such as GPUs have been extensively used to support these applications. As accelerators become more powerful and more data-hungry, the I/O performance lags behind. This creates a crucial performance bottleneck, especially in distributed DLT. At the same time, the exponentially growing dataset sizes make it impossible to store these datasets entirely in memory. While today’s DLT frameworks typically use a random sampling policy that treat all samples uniformly equally, recent findings indicate that not all samples are equally important and different data samples contribute differently towards improving the accuracy of a model. This observation creates an opportunity for DLT I/O optimizations by exploiting the data locality enabled by importance sampling.
In this talk, I’ll present the design of SHADE, a new DLT-aware caching system that detects fine-grained importance variations at per-sample level and leverages the variance to make informed caching decisions for a distributed DLT job. SHADE adopts a novel, rank-based approach, which captures the relative importance of data samples across different mini-batches. SHADE then dynamically updates the importance scores of all samples during training. With these techniques, SHADE manages to significantly improve the cache hit ratio
of the DLT job, and thus, improves the job’s training performance.
Performance tools are inseparable from complex HPC applications’ performance analysis and engineering life cycles. Due to the application’s complexity, various performance analysis tools are created to serve different analysis purposes and provide a deeper look at certain aspects of the applications. Although these tools might operate differently, having coherent information and consistent metrics across all tools is mandatory for ensuring analysis continuity. It is common for performance analysts to switch their usual performance tools due to various reasons and limitations. In this work, we look specifically at the I/O performance analysis tools landscape and introduce Mango-IO to verify the result consistencies between tools and provide tool-agnostic metrics calculation methods. Our analysis and case study provides lesson learned and guideline for ensuring measurement continuity and comparability.
Deep learning has been shown as a successful method for various tasks, and its popularity results in numerous open-source deep learning software tools. Deep learning has been applied to a broad spectrum of scientific domains such as cosmology, particle physics, computer vision, fusion, and astrophysics. Scientists have performed a great deal of work to optimize the computational performance of deep learning frameworks. However, the same cannot be said for I/O performance. As deep learning algorithms rely on big-data volume and variety to effectively train neural networks accurately, I/O is a significant bottleneck on large-scale distributed deep learning training.
In this talk, I aim to provide a detailed investigation of the I/O behavior of various scientific deep learning workloads running on the Theta cluster at Argonne Leadership Computing Facility. In this talk, I present DLIO, a novel representative benchmark suite built based on the I/O profiling of the selected workloads. DLIO can be utilized to accurately emulate the I/O behavior of modern scientific deep learning applications. Using DLIO, application developers and system software solution architects can identify potential I/O bottlenecks in their applications and guide optimizations to boost the I/O performance leading to lower training times by up to 6.7x.
Provide attendees with a detailed view of the Long-Term Digital Archive solution implemented at CSIR.
This solution enables CSIR to provide a high performance, highly scalable, integrated archive ecosystem that allows internal departments and external organisations to collaborate, archive and preserve data driven research in support of National Science and Strategic priorities.
The importance of arriving at the right data management strategy becomes paramount as the size of HPC datasets continues its inexorable march towards zettabytes.
The Agrometeorology division of the Agricultural Research Council represents a research group focused on weather and climate in relation to agricultural activities. One aspect of the group’s activities is to investigate how climate change could impact agricultural activities, focusing on smallholder to large-scale commercial farming activities. To understand the future effects of climate change for consideration in agricultural decision-making, Global Climate Models (GCM) can provide us with climate change projections, which represent potential future climate scenarios. However, GCMs have very coarse spatial resolutions (around 100 x 100 km or more), which are not appropriate to apply for decision-making in agriculture. Therefore, to guide agricultural decision-making across South Africa, the Agrometeorology division has begun dynamically downscaling GCM outputs from selected models contributing to the sixth phase of the Coupled Model Intercomparison Project (CMIP6). For this task, we are utilising the Weather Research and Forecasting (WRF) model to downscale GCM outputs to an 8 x 8 km spatial resolution for a range of future scenarios (i.e., SSP1-2.6, SSP2-4.5, and SSP5-8.5). To undertake this massive computing task, the division relies very heavily on the computing resources offered by the Centre for High Performance Computing (CHPC) Lengau Cluster; without these resources, this dynamic downscaling task would not be possible. Thus, in presenting this work, I will highlight just how significant and valuable the CHPC resources are for our work, and I will share on our progress (with some results), challenges, and successes to date.
We present an overview of some of our recent DFT studies of bulk solid-state systems, of Iron, metal-oxide and alloys, as well as two-dimensional silicene. In particular, we show the applications of x-ray absorption near-edge spectroscopy (XANES) to elucidate the physical and chemical properties of these materials. We show the possibility to induce novel-magnetic properties in silicene through small transition-metal vanadium cluster inclusion. Furthermore, we describe albeit briefly, our recent collaborative work on rare-earth oxides nanostructures. Finally, the central role of high-performance Linux clusters at the CHPC (South Africa) in our computational studies and in facilitating research collaborations within Africa and beyond is discussed.
Description: This is the review article on using computational modelling to accelerate the drug development process for viral infections based on African indigenous medicinal plant species.
Background: Natural products or related drugs such as botanicals or herbal medicines make up approximately 35% of the annual global market, followed by 25% from plants, 13% from microorganisms and 3% from animal sources. The use of indigenous medicinal plant species used on traditional medicines has been used for centuries to treat viral infections. The constant growth of the human population and human interaction with the environment have led to several emerging and re-emerging RNA viruses responsible for diseases and pandemics. Considering the continuous spread of major viral pathogens as well as unpredictable viral outbreaks of emerging or reemerging viral strains, it is essential to ensure preparedness interventions to treat and manage yet another global health crisis.
Aim: The review article explores the potential application of computational modelling in identifying antiviral drugs informed by indigenous knowledge systems for future pandemic preparedness by the pharmaceutical industry.
Methodology: The South Africa’ National Recordal System, which has been developed under the IKS Policy (2007), was used to identify the indigenous medicinal plant species used to treat respiratory diseases. The plants species, Bulbine frutescens, Cyclopia genistoides, Harpagophytum procumbens, Kigelia Africana, Siphonochilus aethopicus, Sutherlandia frutescens, Trichilia emetic, Warburgia salutaris, Xysmalobium undulatum Lippia javanica were identified. A systemic review of these plant species was conducted using past literature papers.
Results: From the literature, most of these plants have been shown to exhibit a wide range of chemical compounds with potential health benefits as shown in in vitro and in vivo studies for inhibition of the Human Immunodeficiency virus (HIV). The use of the computational modelling in small molecule drug discovery will proficiently accelerate the drug development process thereby impacting on the pharmaceutical industry while ensuring benefit sharing arrangements are released with the communities in terms of the Nagoya Protocol on Access and benefit sharing.
ABSTRACT
The study evaluates the performance of the Conformal-Cubic Atmospheric Model (CCAM) when simulating an urban heat island (UHI) over the City of eThekwini, located along the southeast coast of South Africa. The CCAM model is applied at a grid length of 1 km on the panel with eThekwini, in a stretched grid mode. The CCAM is coupled to the urban climate model (UCM) called the Australian Town Energy Budget (ATEB). The ATEB incorporates measured urban parameters including building characteristics, emissions, and albedo. The ATEB incorporates the landcover boundary conditions obtained from the Moderate Resolution Imaging Spectro-radiometer (MODIS) satellite. The CCAM configuration applied realistically captured the orientation of the city and landcover types. Simulations of meteorological variables such as temperatures and longwave radiation reproduced the spatial distribution and intensity of the UHI. Results shows that the UHI is stronger during summer and weaker in all other seasons. The UHI developed because of natural factors (i.e., distribution of longwave radiation) and human factors (i.e., urban expansion, an increase in anthropogenic emissions, and additional heating). Due to the city’s location along the coast, the UHI simulation could be weakened by atmospheric circulations resulting from land and sea breezes. Mitigation methods such as applying reflective paints and re-vegetation of the city may increase albedo and latent heat fluxes but reduce the sensible heat fluxes and weakens the UHI. However, the UHI may not be completely eliminated since natural factors emissions constantly influence its development.
To adopt a universal digital leadership paradigm for Africa would mean having to ignore previous research that indicates the link between the African cultural context and its influence on shaping decisions and behavioural patterns within organisations. Similarly, to ignore the emerging digital leadership paradigm entirely would deprive Africa of a wide variety of leadership management theories and practices that have already been developed and proved effective. In order to bridge the differences between the two noted positions, a novel conceptual framework will be developed in this proposed study. This framework will synthesise the best scientific facets of both the emerging digital leadership paradigm and the African leadership context with an African-based value system (i.e., the framework will use crossvergence. In this way, the proposed study aims to explore a new digital leadership paradigm that is specifically focused on addressing the unique African digital scenario.
Abstract— Culture influences how agile frameworks are implemented, and agility is said to be suitable in contexts where flexibility and spontaneity are emphasized. While past studies have investigated the influence of national culture on Agile implementations in Western and Eastern contexts, studies focusing on a South African software development context is limited. Furthermore, few studies have focused on the effect of cultural differences within software engineering in general. The purpose of this study is to describe how national culture influences Agile roles within the South African software development context. The study was interpretive and was executed using a qualitative, semi- structured interview research strategy directed at Agile practitioners in South African software development teams. The thematic analysis technique was used to analyze the data. Ten propositions have been formulated to highlight how national culture dimensions influence Agile roles.
Index Terms— Agile Software Development, National Culture, Agile Roles, South Africa.
TBC
TBC
All poster presenters must be at their posters to answer questions during this session. Drinks, nibbles and refreshments will be served.
As the emergence of ExaFLOPS Top500 Systems like ORNL Frontier HPC cluster in June 2022, we plan to apply our innovative fine-grained topology-aware software-hardware ATMapper to improve benchmark performance toward ExaFLOPS system’s peak performance. Due to application challenges in data movement, limited degree of parallelism, sparse matrix and/or irregular workflow, the sustained benchmark performance like HPCG can only reach ~1% of system peak performance (14PF/1685 PF), compared to world’s best HPCG Benchmark of ~3% peak performance (16PF/537PF) by Riken Fugaku cluster in November 2022. Comparing two software-hardware graph-mapping approaches for workflow partitioning/assignment/scheduling in our previous 2021 DoE VFP project, we tested Dr. Butko’s load-balanced LBNL TIGER mapper using D-Wave’s Quantum/SimulatedAnnealer, and our Dr. Shih’s self-organizing load-imbalanceATMapper using AI A* search. We are optimistic about designing a better future Q/AI TIGER/ATMapper hybrid to help most any complex, irregular HPC applications finding the best topology-aware processor assignment (or application-custom network topology synthesis) given their computation workflow dependence constraints. Dr. Shih’s ATMapper is a self-organizing load-imbalanced static workload assignment/scheduler, capable of an average 0.5 data hop on 90% of data movement (0 hop: reusing same processor node as possible, or 1 hop: transferring data if necessary to immediate neighbor node), comparing to the typical 3 hops data movement among switches on ORNL Frontier Dragonfly topology enhanced by dynamic HPE Cray Slingshot Interconnect. We hope that our static algorithm-specific topology-aware ATMapper workload scheduling will complement HPE’s Slingshot Interconnect dynamic run-time load-balanced traffic routing optimization to increase HPCG software benchmark performance (currently <3%) moving closer toward full system peak performance. With QAs’ negligible cost/power/space requirement, QA^HPC software-hardware co-design optimization is a green game-changer toward computation cost efficiency and sustainability for both HPC application users and data center providers.
The vast amounts of data generated by scientific research hold immense potential for advancing knowledge and discovery. However, the complexity and sheer volume of this data often pose significant challenges in terms of accessibility, analysis, and interpretation. Scientific data democratization aims to address these challenges by enabling researchers to easily access, analyze, and share scientific data, regardless of their technical expertise or the location of the data. My group along with our close collaborators have worked with hundreds of scientific communities in the past 20 years, in many disciplines ranging from astronomy, fusion, combustion, seismology, weather, climate, accelerator science, material science to clinical pathology. In our partnerships we have created sustainable software components which help address the following needs 1) Creating a self-describing I/O framework which allows data to be read/written at terabytes/sec, 2) Having the ability to query PBs of data efficiently even for derived quantities which are NOT contained in the data, 3) Having the ability to subscribe to data (in memory) without modifying the codes such that I/O is abstracting from data-at-rest to data-in-motion, 4) Creating new mathematical formulations which allows data to be reduced in both the size and in the degrees of freedom to allow for faster access, and 5) having the ability to work with federated data, as if it was local. In this presentation, we will explore the challenges and opportunities associated with scientific data democratization, along with some of the work we have done in these fields.
Masters and Doctoral students must be at their posters to answer questions during this break.
The proliferation of digital content has highlighted the disparity in language translation resources, especially for low-resource languages. This research addresses the critical gap in translation and detection technologies for such languages, which is vital for preserving linguistic diversity and ensuring equitable access to information. Our objective is to enhance the accuracy and efficiency of language translation and detection using advanced machine learning models.
We have implemented and compared two different architectures: one serving as a benchmark and the other, transformer models, which boast parallel processing capabilities. These models promise improvements in both translation quality and computational efficiency. Preliminary results indicate that transformer models show significant promise in handling the nuanced structures of low-resource languages.
The implications of this research are profound, offering the potential to democratise information across linguistic barriers and to protect the cultural heritage embedded in language. This study is a step toward bridging the digital divide and fostering inclusivity in the global information ecosystem.
Our presentation at the CHPC National Conference will delve into the methodologies employed, provide a comparative analysis of the models, and discuss the ongoing evaluation of our results. We aim to contribute to the development of more robust and accessible language technologies, particularly for languages that are at risk of digital extinction, including many of South Africa's official languages.
Higher Education Institutions (HEIs) capture and store a lot of data. This data is for employees and students using different systems. The data access to the users is restricted mainly to the departments and units that the employees belong to. The challenge at hand is that each time when employees would like to perform further data analysis, a specialist or consultant is needed. On the other hand, HEIs spend a lot of money on getting software licences and modern Business Intelligence systems. One would expect such software and systems to be fully utilized by employees, yet HEIs still spend more on consultants. Most consultants are brought for different data analysis and to interpret data. HEIs also spend on training employees on data, yet many still rely on consultants and data experts to have understanding of data. Therefore, it is clear that HEIs employees both administrative and academics lack data democratization.
The study aims to answer the following question:
How can Higher Education employees be engaged to develop a Data Democratisation Framework in South Africa?
The study is based on systematic review using secondary data available of HEIs in South Africa. Data is gathered on systems used by HEIs and data management related issues faced by employees. Results are presented on the data illiteracy among employees and data democratization issues. Based on the findings and through engaging selected HEIs employees, key components of a data democratisation framework are presented. The study shows that participants or data custodians would want to be empowered and have more access to the data that they deal with. Employees require skills and training that enables them to self-manage data and perform data analysis.
Key Words: Data democratisation, data literacy, data access, High Education Institutions, Data Custodians
South Africa faces various environmental threats, among them being the altering nature of extreme events due to climate change. To build the nation's resilience, comprehensive policy development and informed decision-making are imperative. The presentation will focus on the South African Risk and Vulnerability Atlas (SARVA), a platform funded by the Department of Science and Innovation (DSI) and hosted by the National Research Foundation - South African Environmental Observation Network (NRF-SAEON). In 2008, the DSI introduced SARVA as a centralized repository for climate and environmental data in South Africa. Initially comprising static maps and paper publications, SARVA has since evolved, now offering an assortment of interactive, digital decision support tools such as dashboards, infographics, and a searchable atlas. Its diverse datasets encompass environmental, economic, social, and settlements subsets, all aligned with the 17 Sustainable Development Goals (SDGs). The platform incorporates crucial information on recorded disasters, sourced from the Emergency Events database (EM-DAT) of the Centre for Research on the Epidemiology of Disasters (CRED), the National Climate Change Information System of the Department of Forestry Fisheries and Environment, and the climate risk profiler based on contributions from various sources. Leveraging a Geographic Information System (GIS) based platform, SARVA empowers users to visualize pertinent data layers, facilitating informed decision-making. Moreover, SARVA welcomes data contributions from all users, not solely those affiliated with the NRF-SAEON, with each dataset being assigned a unique digital object identifier (DOI). Furthermore, SARVA is integrated with the Open Data Platform of SAEON is CoreTrustSeal accredited.
Climate data are vital for climate research and applications. In The South African Weather Service being the national custodian of climate data in the country must ensure quality controlled climate data is available in the national climate database for use by researchers, students and to safeguard life and property. Furthermore, via Global Telecommunication System, climate data is assimilated to the Global Climate Models used global for Climate Change projections. However, in recent years, the SAWS observation network has been declining, compromising data availability in the national climate database. Initiatives to overcome these challenges are underway, i.e data filing, modernization and the inclusion of artificial intelligence are explored.
Discussion
Computational fluid dynamics (CFD) has proven to be a powerful tool for elucidating flow features in a range of disease cases. The technique can be used in combination with other approaches to capture key features of a specific disease. In this talk, we consider two disease cases that have benefitted from high performance computing (HPC). The first case, Coarctation of the Aorta (CoA), is a congenital heart defect which is present at birth and alters the distribution of blood in the body. The effects of the disease tend to be present in childhood and beyond, and medical intervention aims to manage the condition throughout the lifetime of the patient. Haemodynamic simulations, based on CFD, can give insight into how different treatment interventions are likely to impact local fluid dynamics. In order to compute these flow solutions, patient specific boundary conditions can be obtained using echocardiography and where datasets are incomplete, augmented with machine learning approaches. The outputs from the CFD model can be used as inputs for an agent-based model (ABM), which shows great potential for capturing growth. The second case, thrombosis, or blood clotting is a condition which is present in a number of diseases. CFD is coupled to biochemistry to capture local haemodynamics and chemical reactions. The growing clot is captured as a porous medium which affects blood flow and the transport of chemical species. In some instances, devices used for treatment can also be modelled in the flow domain. For both disease cases, the modelling processes described take place over different timescales and require careful consideration of computing resources.
High-performance computing (HPC) is an increasingly influential field with the potential to transform various industries, with nanomedicine being one of the key beneficiaries. Nanomedicine is the integration of nanotechnology into medical practices, encompassing the diagnosis, treatment, and prevention of diseases. Among the various nanoparticles used in this field, lipid-based nanocarriers stand out as versatile tools. Comprised of lipids, the fundamental components of cell membranes, lipid-based nanocarriers are employed to transport drugs, genes, and therapeutic agents to target cells and tissues within the human body. HPC plays a pivotal role in advancing this field by aiding in the development of novel lipid-based nanocarriers and optimizing their drug delivery mechanisms. Biogenic nanoparticles, which are naturally produced by living organisms ranging from microbes to animals, have immense potential within nanomedicine. Their applications span drug delivery, imaging, and tissue engineering. To delve deeper into the behaviour of lipid membranes and their interactions with nanocarriers and biogenic nanoparticles, computational tools such as coarse-grained molecular dynamics (CGMD) are indispensable. CGMD simulations model groups of atoms as single beads, enabling the analysis of large biological systems, which would be impractical with traditional all-atom molecular dynamics (MD) simulations. To facilitate these simulations, the CHARMM-GUI platform provides a user-friendly interface for setting up and executing MD simulations using the CHARMM force field. Specifically, CHARMM-GUI Martini Maker allows researchers to configure a variety of lipid membrane systems, including planar bilayers, micelles, and vesicles. By harnessing HPC resources for the design of new lipid-based nanocarriers and biogenic nanoparticles, optimizing therapeutic agent delivery, and investigating the interaction between these nanomaterials and biological entities, the potential for breakthroughs in nanomedicine is vast. This synergy of HPC and nanomedicine holds promise for enhancing healthcare solutions and revolutionizing disease treatment and prevention strategies.
Pyrometallurgical furnaces are integral for extracting valuable metals from ores, operating at temperatures exceeding 1600◦C. These furnaces represent complex multiphase systems, posing significant challenges for direct industrial-scale study.
Multiphysics models provide critical insights into these complex behaviors, assisting furnace designers and operators in making informed decisions regarding design and operation.
In most furnaces, materials are charged, smelted, and accumulated, followed by a tapping process. The furnace features a ’tap-hole’, a channel through the steel and brickwork, used for periodically opening and closing. The opening process involves lancing to remove refractory clay, akin to using a cutting torch. High temperatures are achieved by oxygen reacting with the steel lance. Once the lance penetrates the clay, unburned oxygen gas can enter the furnace, potentially impacting the molten material inside.
A multiphase fluid flow model was employed to study bulk flow inside the furnace, assessing the significance of lancing. Given the high-temperature processes and the compressibility of oxygen gas compared to other process materials (metal and slag), an evaluation of compressibility effects on the study’s outcome is essential.
Using the compressible solver, which requires solving the energy equation, introduces the need for additional material properties as a function of temperature, leading to potential uncertainties. In this study, two solvers, multiphaseInterFoam and compressibleMultiphaseInterFoam from OpenFOAM v2212, were tested on a large-scale process. Typical meshes of around 3.2 million elements were used, necessitating high-performance computing hardware.
This paper presents a comparative analysis of the results and performance of the two solvers under various conditions.
Projection of Droughts in Africa within different extents of global warming as performed on The CHPC’s Lengau Cluster
African temperatures are projected to rise rapidly under low mitigation climate change futures, at 1.5 to 2 times the global rate of temperature increase. This high regional climate sensitivity in combination with the relatively low adaptive capacity implies that the global climate change mitigation effort is of crucial importance to Africa. Against this background the Standard Precipitation Index (SPI) with 36-month accumulation time (relevant to agricultural and hydrological drought) was considerd for six regional downscalings over Africa under a low mitigation scenario (RCP8.5) and for 1.5, 2 and 3 °C of global warming. The associated projected changes in maximum temperatures and very hot days were also considered. .Using the Coordinated Regional Downscaling Experiment-Africa (CORDEX) regional climate models, we downscale six global climate models of the Coupled Model Inter-comparison Project Phase 5 (CMIP5) to high resolution with the aid of computing power from the south African (CHPC) Centre for High Performance Computing’s Lengau Cluster.The analysis reveals that the southern African is already experiencing increased conditions of dryness and is likely heading towards a regional climate system that may well be associated with more frequently occurring droughts. Under 3 °C these increased conditions of drought are projected to occur within the presence of a drastic increase in maximum temperature and very hot days. Such a change, of a hot and drier climate system becoming even hotter and drier would offer very few options for climate change adaptation. It is likely that under 2 °C of global warming this general pattern of increased dryness will already be manifested over southern Africa, but this regional world is not projected to be significantly drier at 1.5 °C of warming compared to its present-day climate (indicating a benefit for southern Africa for the 1.5 °C global goal being achieved). For East Africa, increased wetness and potentially more floods are projected under 3 °C of warming, a pattern that may well be manifested (although with reduced amplitude) under 2 °C and 1.5 °C of warming. Associated increases in wetness are also projected across the Sahel, which under 1.5 °C may be a benefit, given that the detrimental effects of rising temperatures will be reduced.
Questions
Comprehensive view into Dell’s AI strategy, Why Dell for AI, and our individual products, solutions, partnerships, and services for multiple use cases in any location.
The NVIDIA Accelerated Compute Platform offers a complete end-to-end stack and suite of optimized products, infrastructure, and services to deliver unmatched performance, efficiency, ease of adoption, and responsiveness for scientific workloads. NVIDIA’s full-stack architectural approach ensures scientific applications execute with optimal performance, fewer servers, and use less energy, resulting in faster insights at dramatically lower costs for high-performance computing (HPC) and AI workflows.
TBC
Higher education institutions have been under pressure to do more with less. As a result, many institutions are laden with traditional IT departments which are either at the point of requiring refreshing or no longer fit for practice in the modern university environment. Currently, the majority of universities have a large number of different applications and systems in use across the organisation that students may need to access. As many of the systems need to interact with each other, this presents a significant challenge which digital transformation must embrace. In this talk, we identify and classify these challenges based on a systematic literature review. The findings reveal several barriers that inhibit digital transformation in higher education. These were then organised into six broad categories, namely environmental, strategic, organisational, technological, people-related and cultural. The talk provides a comprehensive understanding of the barriers faced, facilitating the development of effective strategies and interventions. Our analysis provides valuable information for higher education institutions, policymakers and stakeholders involved in digital transformation initiatives.
The digital age has ushered in a labyrinth of data, where navigating and making
meaningful insights can be as challenging as it is critical. This presentation seeks to address this challenge head-on by focusing on the enhancement of digital literacy in the context of research data management (RDM), within the framework of library services.
This presentation will offer a strategic roadmap for libraries to evolve from traditional gatekeepers of knowledge to dynamic enablers of data proficiency. Practical methodologies for integrating digital literacy into RDM services will be shared, ensuring that libraries remain at the forefront of the information age. By fostering an environment of open access and user-centric data education, communities can be equipped with the necessary skills to not only access but also critically analyse and ethically use data.
Insights will be shared on the University of Pretoria’s Digital Transformation journey and its process in evaluating and enhancing digital literacy skills. Anticipating the future, the talk will also explore innovative ways to engage with and support the diverse needs of our patrons, making libraries the cornerstone of a data empowered society. Join in envisioning a world where every individual has the literacy to not just consume data but to harness it, transforming libraries into catalysts for knowledge and growth in the digital era.
Data Democratization (DD) has been defined and conceptualized in many ways but all pointing towards making data more accessible to a wider range of people within an organization or a society (Zeng et al, 2018), (Shamim et al. 2021), (Awasthi et al, 2020). Data Democratization allows data to transition from the hands of a selected few so that it can be used by all. Due to the changing landscape brought by digitalization, data has emerged as a resource of prime importance to our daily lives and as such it must be made more accessible to businesses, employees, citizens, and the public sector. Hence modern organizations need to consider new, digitally relevant measures to adopt policies, structures, values, and assumptions in the context of Data Democratization.
Botswana has made provisions to address data through the instruments that provide various levels of guidance regarding the management of data in the country. However, a perusal of existing legal instruments indicates insufficiency of provisions for measures related directly to the support of open data due to the relatively low institutional capacities and adoption of open data. A situation analysis on data management established that Botswana‘s ratings on all the indicators for Open Data readiness were below acceptable levels, Open Data Readiness Assessment, (World Bank Group, 2015). (The Open Data Inventory, 2020/2021) further reveals that Botswana had not yet adopted the (The Open Data Charter (ODC), 2015). Another report (Open Data Inventory, 2022/2023) ranks the country as 93rd/193 with an overall score of 51%.
The research will discuss the efforts the country is making in response to these findings, in terms of showcasing potential enabling factors, opportunities and challenges towards Data Democratization aligned with a developing country.
References
1. World Bank Working Group (2015). v: Prepared for the Government of Botswana, World Bank Group.
2. The Open Data Inventory (ODIN). (2020). Botswana country profile. Open Data Watch. https://odin.opendatawatch.com/Report/countryProfileUpdated/BWA?year=2020
3. The Open Data Charter, 2015, https://www.data4sdgs.org/partner/open-data-charter#:~:text=The%20ODC's%20goal%20is%20to,%2C%20economic%2C%20and%20environmental%20challenges
4. The Open Data Watch (2022). 2022/23 Open Data Inventory: The ODIN Biennial Report. https://odin.opendatawatch.com/Report/biennialReport2022
5. Zeng, J., and Glaister, K. W. 2018. “Value Creation from Big Data: Looking inside the Black Box,” Strategic Organization (16:2), SAGE Publications, pp. 105–140.
6. Awasthi, P., and George, J. J. 2020. "A Case for Data Democratization," in: Americas Conference on Information Systems. Virtual: AIS Library
7. Shamim, S., Yang, Y., Zia, N. U., and Shah, M. H. 2021. "Big Data Management Capabilities in the Hospitality Sector: Service Innovation and Customer Generated Online Quality Ratings," Computers in Human Behavior (121), pp. 106-777.
DIRISA Services consist of services like Data Management Plan (DMP), Data Deposit Tool (DDT) and Storage request. All these services are accessible through single point of access called Single Sign On (SSO). DMP is a formal document that outlines how a researcher will handle data before, during and after the project is completed. Outlines the practices for collecting, organizing, backing up, and storing the data that will be generated. Fewer benefits of DMP are more visible, citable on research data, encourage data sharing and more collaboration. When done with data planning DDT store data so that research can start collecting data and store the data on DDT. Furthermore, with DDT user get 100GB for free and storage increase based on meeting requirements for storage increase. Storage request service is where users make requests when they need more storage.
Discussion
The CHPC Users Forum Birds-of-a-Feather (BoF) aims to bring all users of CHPC resources together with the following aims:
All users are kindly invited and encouraged to attend, this includes users of the CPU, GPU and the Cloud resources at the CHPC.
Short overview presentations will be presented, but the bulk of the time will be focused on informal engagement with the user community.
Looking forward to seeing you at the BoF!
TBC
TBC
Dawn has been created via a highly innovative long-term co-design partnership between the University of Cambridge, UK Research & Innovation, the UK Atomic Energy Authority and global tech leaders Intel and Dell Technologies. This partnership brings highly valuable technology first-mover status and inward investment into the UK technology sector. Dawn, supported by UK Research and Innovation (UKRI), will vastly increase the country's AI and simulation compute capacity for both fundamental research and industrial use, accelerating research discovery and driving growth within the UK knowledge economy. It is expected to drive significant advancements in healthcare, green fusion energy development and climate modelling. In this talk, there will be opportunities for South African Scientists to develop a framework on access to Dawn and doing benchmarks for some of the applications relevant to South Africa. So do attend, you might just win yourself part of the Director’s Discretionary time on Dawn.
TBC
Description:
The WHPC Birds-of-a-Feather (BoF) session for 2023 will be a unique platform for participants to actively establish teamwork, get to know each other better through fun outdoor activities like never before. Participants will have a chance to team-up with others in a quest to “change the landscape of HPC” in South Africa.
As such, we are delighted to invite conference participants (male and females), to pick up where we left of the last session held during 2022 annual conference. The initiative's main goal was to create a network of Women in HPC in South Africa by bringing them together during the meeting. The workshop was sponsored and attended by both men and women, and strongly supported by the CHPC management team.
Anticipated Goals:
• Strong professional relationship
• Improve women's underrepresentation in HPC (Contribute in increasing the number of women and girls participation in HPC through training and networking)
• Share information and resources that foster growth for women in HPC (institutionally and across community)
• Raise our professional profiles
• Encourage young girls at school level to consider HPC as a career of choice
Target audience: Women and Men
In this talk, we will uncover different approaches to implement, deliver, and democratize AI solutions. Each approach will fit different use cases. We can use Datacenter CPU, GPU, a specialize AI accelrators, or Edge. This way, AI developers will have the flexibility and agility to implement the most efficient solution. We will also touch base on how AI and HPC are merging together and how we see the future looks like.