REGISTRATIONS ARE NOW CLOSED AS CAPACITY LIMIT REACHED.
Please direct queries to: chpc@csir.co.za
The aim of the conference: to bring together our users so that their work can be communicated, to include world renowned experts, and to offer a rich programme for students, in the fields of high performance computing, big data, and high speed networking. The CHPC National Conference is co-organised by the CHPC, DIRISA and SANReN.
The CHPC 2025 Conference will be an in-person event with a physical programme hosted at the Century City Conference Centre, Cape Town.
For more information please see the main conference site.
This year's theme is the utility of cyber-infrastructure in processing, storing and moving the large data sets underpinning today's complex world.
Online registration will close on Friday 27 November 2025. Thereafter only onsite registration (at full fees) will be available at the venue.
SADC Cyber-Infrastructure Meeting
Full programme:
https://events.chpc.ac.za/event/155/attachments/260/500/15_SADC_CI_Experts_Meeting_Draft_Agenda__30_Nov_2025.pdf
This hands-on workshop introduces students to building a basic Retrieval-Augmented Generation (RAG) system. Participants will learn how to index a document corpus using embeddings, implement a vector search retriever, and connect it to a language model for context-aware responses. The session covers key components like vector store, prompt design, and system evaluation. By the end, students will have built a simple, working RAG pipeline. Basic Python knowledge is recommended.
Requirements:Each attendee should bring their laptop along and have Jupyter notebook with python 3.10 pre-installed.
Linking atomistic energetics to macroscopic rates remains a core challenge in heterogeneous catalysis. This full-day, hands-on workshop takes participants from surface slab setup and adsorption energies (Python/ASE) to working microkinetic models. We use machine-learned interatomic potentials to approximate energetics, then build reaction networks, derive transition-state-theory rate expressions, and solve mass-balance ODEs to obtain coverages, turnover frequencies, and degree-of-rate-control sensitivities. Practical HPC on CHPC is woven throughout. Attendees leave with a functional model for a reaction of their choice and a clear roadmap to extend it.
SADC Cyber-Infrastructure Meeting
Detailed progremme:
https://events.chpc.ac.za/event/155/attachments/260/500/15_SADC_CI_Experts_Meeting_Draft_Agenda__30_Nov_2025.pdf
contunues
SADC Cyber-Infrastructure Meeting
Detailed programme:
https://events.chpc.ac.za/event/155/attachments/260/500/15_SADC_CI_Experts_Meeting_Draft_Agenda__30_Nov_2025.pdf
The Materials Modelling Workshop using Density Functional Theory (DFT) provides postgraduate students, early-career researchers, and interdisciplinary scientists with a solid foundation in computational materials science. DFT is a powerful quantum mechanical method for predicting the structural, electronic, optical, and magnetic properties of materials, enabling the rational design of novel systems for energy, catalysis, and optoelectronic applications. Through a combination of lectures and hands-on sessions, participants will gain practical experience with leading DFT software tools, learning how to perform structure optimization, electronic structure calculations, and property analysis. The workshop will highlight applications in sustainable energy materials, topological systems, and low-dimensional materials. Participants will also be introduced to emerging approaches that combine DFT with machine learning for accelerated materials discovery and inverse design. By the end of the workshop, attendees will possess both theoretical insight and computational skills to apply DFT (CASTEP in materials studio) techniques effectively in their own research and to contribute to innovation in materials design and development.
This hands-on workshop introduces participants to both using and managing virtual resources within an OpenStack Cloud environment. Participants will be introduced to the Horizon dashboard and how to create resources via that web-based interface. Then, participants will be taken through the steps to alternatively define their resources as code, using Terraform and GitLab to create and manage the latter. By the end, participants will have built some virtual resources within OpenStack containing the applications that they decided to deploy, using one of the methods indicated above.
Requirements: Each participant should bring their laptop along, have access to a Unix-friendly terminal/command-line interface, and have git and Terraform pre-installed.
There are limited places in this workshop. Please sign up by 28 November at
Quantum machine learning (QML) is an exciting new field of study which harnesses the laws of quantum mechanics and applies it to classical machine learning models. QML has a wide variety of potential applications spanning many fields which include healthcare and life sciences, climate and sustainability, finance and optimization. Two parallel activities have emerged to pioneer the field of quantum discoveries: one is the technological advancements required (hardware, software and algorithms) and the other is the rapid exploration of domain-specific problems towards identifying quantum advantage over classical methods. In this workshop, we will provide the participants with an introduction to QML from a theoretical perspective, as well as a practical implementation of the Qiskit programming SDK.
SADC Cyber-Infrastructure Meeting
Detailed programme:
https://events.chpc.ac.za/event/155/attachments/260/500/15_SADC_CI_Experts_Meeting_Draft_Agenda__30_Nov_2025.pdf
part 2
See the description for part 1 to sign up.
Places are limited.
Scientific computing has undergone a fundamental transformation over the past decade. GPU-accelerated computing, which in 2015 was still at the fringes of supercomputing, has become the norm today. The recent advances — indeed, the revolution — in AI, together with the broader digital transformation of science, have placed HPC at the very center of scientific discovery. No AI-for-Science project can succeed without it.
This tremendous opportunity, however, also poses a threat to HPC centers that fail to adapt to the needs of scientists. The success of AI in science depends not only on access to sufficient compute power but equally on data — and data requires curation and domain-centric workflows. Traditional HPC operational models are ill-suited for this.
In a recent publication [1], Hoefler et al. introduced the concept of Acceleration as a Service (XaaS), a step in the right direction that leverages container-based deployment of HPC workloads. In this presentation, I will go a step further and demonstrate how supercomputing infrastructures can evolve into service-oriented architectures [2]. Drawing on principles from cloud computing and exploiting features of modern high-performance networks, we can simultaneously serve operational weather prediction, climate simulation, AI, traditional HPC, and large-scale experimental and observational instruments — all through specialized, elastic platforms that do not compromise scalability or performance.
[1] T. Hoefler et al., “XaaS: Acceleration as a Service,” arXiv:2401.04552, Jan. 2024.
[2] M. Martinasso, M. Klein and T.C. Schulthess “Alps, a versatile research infrastrucutre,” arXiv:2507.02404, Jul. 2025.
The presentation will discuss the use of traditional computational methods, machine and deep learning, as well as quantum computing and quantum machine learning (as a new frontier) in addressing challenges in fluid dynamics, dynamical systems, and high-energy physics research. And the talk will highlight the role of the CHPC in democratising access to critical resources and the enablement of such research.
For over two decades I have been developing new interatomic potentials, e.g., implemented the AOM within GULP that can be employed to model non-spherical Jahn Teller Mn(III) ions, successfully refined potential parameters to model numerous systems including the Peierls' phase transition of VO2, and I am the author of a published interatomic potential parameter database. My interest is driven by the ability to control what physics is included (or not) by the introduction of new terms to the Hamiltonian (or potential energy) and it is an approach many will follow as, compared to DFT, it allows for modelling systems of larger sizes (more atoms), greater time periods (in MD), and more sampling (global optimisation and/or calculating the partition function).
Now ML potentials, which have many more parameters to refine and a minefield of differing functional forms to choose, have become very topical as data required to fit these as well as computer resources have become more readily available. My first real experience came with one of my earlier PhD students discovering that it was not straightforward to develop a suitable model (fit parameters), for example, the GAP ML potentials we have refined suffered from the erroneous oscillations.
I lead UK's Materials Chemistry Consortium and one of our current aims is to make the use of ML potentials more accessible to our community. Simultaneously, other groups have begun refining ML-Potential models for the entire Periodic table based on reproducing DFT results. In my presentation I will present results from three of my PGT students who worked on energy materials using the JANUS-core code to calculate the energy and forces, based on pre-refined MACE ML-Potentials. Moreover, I will include recently published results on dense and microporous silica materials where these potentials performed particularly well and further results of ongoing research from the MCC.
The global pandemic, initiated by the SARS-CoV-2 virus and emerging in 2020, has profoundly influenced humanity, resulting in 772.4 million confirmed cases and approximately 7 million fatalities as of December 2023. The resultant negative impacts of travel restrictions and lockdowns have highlighted the critical need for enhanced preparedness for future pandemics. This study primarily addresses this need by traversing chemical space to design inhibitors targeting the SARS-CoV-2 papain-like protease (PLpro). Pathfinder-based retrosynthesis analysis was employed to synthesize analogues of the hit, GRL-0617 using commercially available building blocks through the substitution of the naphthalene moiety. A total of 10 models were developed using active learning QSAR methods, which demonstrated robust statistical performance, including an R2 > 0.70, Q2 > 0.64, standard deviation < 0.30, and RMSE < 0.31 on average across all models. Subsequently, 35 potential compounds were prioritized for FEP+ calculations. The FEP+ results indicated that compound 45 was the most active in this series, with a ∆G of -7.28 ± 0.96 kcal/mol. Compound 5 exhibited a ∆G of -6.78 ± 1.30 kcal/mol. The inactive compounds in this series were compound 91 and compound 23, with a ∆G of -5.74 ± 1.06 and -3.11 ± 1.45 kcal/mol, respectively. The integrated strategy implemented in this study is anticipated to provide significant advantages in multiparameter lead optimization efforts, thereby facilitating the exploration of chemical space while conserving and/or enhancing the efficacy and property space of synthetically aware design concepts. Consequently, the outcomes of this research are expected to substantially contribute to preparedness for future pandemics and their associated variants of SARS-CoV-2 and related viruses, primarily by delivering affordable therapeutic interventions to patient populations in resource-limited and underserved settings.
The spotted hyena (Crocuta crocuta) is a highly social carnivore with a complex behavioural and ecological functions, making it an important model for studying genetic diversity, adaptation, and evolution. However, previous draft genomes for C. crocuta have been incomplete and derived from captive individuals, limiting insights into natural genetic variation. Here, we present a high-quality de novo genome assembly and the first pangenome of wild spotted hyenas sampled from the Kruger National Park, South Africa, alongside population-level analysis.
Using Oxford Nanopore Technologies (ONT) long-read sequencing, we assembled a 2.39 Gb reference genome with a scaffold N50 of 19.6 Mb and >98% completeness. We further performed short-read resequencing at 10-32X depth per individual, revealing >4 million single nucleotide variations and and ~1 million insertions and deletions per individual. To capture genomic variation beyond a single reference, we constructed a draft pangenome using Progressive Genome Graph Builder (PGGB). The resulting pangenome comprises ~2.47 Gb, with 35.2 million nodes, 48.4 million edges, and 159,060 paths, incorporating sequences from all individuals. Its graph structure revealed substantial topological differences, which may correspond to biologically relevant variations.
The breadth of these analyses required extensive use of the CHPC’s computing resources. Long-read genome assembly and polishing were executed on high-memory nodes to accommodate the error-correction and scaffolding steps. Repeat and gene annotation pipelines (RepeatModeler, BRAKER3) as well as variant discovery with GATK and BCFtools were parallelised to accelerate execution. Pangenome graph construction was particularly computationally intensive, requiring large-scale parallelisation and significant memory and storage capacity to manage multi-genome alignments and graph building.
This study provides the most contiguous wild-derived genome to date for the species, the first draft pangenome for C. crocuta, and establishes a foundation for future conservation and comparative genomics. Importantly, it demonstrates the critical role of HPC resources in enabling large-scale bioinformatics pipelines - from genome assembly to pangenome construction and population-level analysis - in non-model organisms.
Q&A
This talk presents an overview of Morocco's emerging HPC ecosystem, highlighting national initiatives, key infrastructure developments, and the growing collaborations that drive computational research and innovation. A particular focus will be given to Toubkal, Morocco's flagship supercomputer, which represents a major step forward in national computing capacity and supports applications in scientific research, artificial intelligence, and industry. The presentation outlines the architecture and capabilities of Moroccan HPC centers and the broader vision for positioning Morocco as a regional hub for advanced computing.
Imad Kissami, College of Computing, Mohammed VI Polytechnic University
The Square Kilometre Array Observatory (SKAO) is currently building two world-class complementary arrays of telescopes that operate in the radio segment of the electromagnetic spectrum. These telescopes will form a next-generation radio astronomy-driven Big Data facility that will revolutionise our understanding of the universe and fundamental laws of physics. South Africa hosts the SKA-Mid telescope, which is one of the two SKAO arrays. This presentation will highlight the intensity of cyberinfrastructure involved in pursuing the science ambitions of the SKAO as well as the envisaged impact thereof, as pioneered by the MeerKAT telescope (a precursor of SKA) which is operated by the South African Radio Astronomy Observatory (SARAO).
The newly established High-Performance Computing and Big Data Analytics (HPC-BDA) Centre of Excellence at Addis Ababa Science and Technology University (AASTU) represents Ethiopia’s bold entry into the continental cyber-infrastructure landscape, complementing South Africa’s CHPC and NICIS. Anchored in state-of-the-art laboratories spanning Business Analytics, HPC & Cloud Systems, Bioinformatics, Agro-Informatics, Computational Science, Cybersecurity, and Meteorological Modelling, the Centre employs a dual-layer strategy that couples foundational infrastructure with high-impact applications in agriculture, healthcare, climate resilience, and the digital economy. By embedding bioinformatics and genomics with secure, INSA-supported data governance frameworks, the Centre uniquely integrates life sciences, policy alignment, and advanced computation into actionable decision systems. Positioned as a “Research Gravity Zone,” it aspires to attract partnerships, catalyze funding, and advance Ethiopia’s Digital Ethiopia 2025 and STI policy priorities, while fostering regional collaboration toward a pan-African HPC-BDA ecosystem that translates data to decisions.
High-Performance Computing (HPC) and Big Data Analytics (BDA) are rapidly transforming the global research and innovation landscape, enabling nations to turn massive data streams into actionable insights.
While South Africa’s Centre for High Performance Computing (CHPC) has demonstrated continental leadership, emerging ecosystems across Africa now have the opportunity to complement and expand this capacity. This presentation introduces the newly established HPC-BDA Centre at Addis Ababa Science and Technology University (AASTU), Ethiopia, as a strategic initiative designed to position Ethiopia as a
regional knowledge hub.
The Center integrates state-of-the-art laboratories in Business Analytics, Cloud & HPC Systems, Bioinformatics, Computational Science, Agro-Informatics, Network & Cybersecurity, and Meteorological Modelling. Its dual-layer strategy links advanced cyber-infrastructure with thematic domains of national priority, agriculture, healthcare, climate resilience, and the digital economy. By embedding bioinformatics and genomics, the Center uniquely connects life sciences with data-driven decision systems, strengthening Africa’s capacity for health security and food sustainability. Supported by the Information Network Security Administration (INSA), the Center also incorporates advanced cybersecurity and governance frameworks, ensuring ethical, secure, and policy-aligned use of HPC-BDA resources for both national and international collaboration.
The paper will highlight how this ecosystem fuels data-to-decision pipelines through advanced HPC workflows, robust partnerships, and alignment with Digital Ethiopia 2025 and the national Science, Technology, and Innovation (STI) policy framework. Furthermore, it will discuss the Center’s regional role in fostering collaboration with continental cyber-infrastructure leaders, including CHPC and NICIS, towards a pan-African HPC-BDA network.
By demonstrating Ethiopia’s novel model of integrating cyber-infrastructure, applied research, and innovation ecosystems, the AASTU HPC-BDA Center aspires to create a “Research Gravity Zone” in Africa, an engine attracting partnerships, funding, and global recognition while directly advancing the CHPC 2025 theme of From Data to Decisions.
Q & A
Abstract: Ransomware remains a significant cyber threat, yet research is often hampered by a lack of modern, balanced datasets. This study proposed CerebRAN, a new dataset made from dynamic analysis of ransomware (400 samples) and goodware (399 samples). We provide a detailed methodology from the sample collection to the extraction of features using Cuckoo Sandbox on a Windows 7 operating system. To validate the usability of CerebRAN, we performed machine learning experiments using Random Forests and Logistic Regression using the Recursive Feature Elimination with Cross-Validation (RFECV) technique. The results we obtained from the experiments show that the Random forests were the superior classifier on CerebRAN scoring accuracy of 0.9625, precision of 0.9628, recall of 0.9625 and f1-score of 0.9625. Logistic regression scored an accuracy of 0.9562, precision of 0.9563, recall of 0.9563 and an F1-score of 0.9562. Random forests outperformed Logistic regression using an optimum 48 features while logistic regression used 174 features. This experiment highlighted how effective and valuable CerebRAN is for the development of robust detection tools. The dataset and sample metadata are publicly available on GitHub.
Peer-to-peer energy markets rely on trust to enable secure participation; however, existing trust models often address only isolated trust concerns. This fragmented approach leaves significant gaps in ensuring holistic trust across the peer-to-peer energy market, exposing participants to market-related threats. To address this, the paper proposes a trust framework grounded in the Trust over IP (ToIP) model, which integrates technical mechanisms and governance policies to sustain trust in decentralised environments. Using the STRIDE threat model, key threats in the peer-to-peer energy market are identified, while also analysing how existing research mitigates these risks. The corresponding trust and security mechanisms are then mapped to the ToIP architecture, offering a comprehensive approach to trust establishment that unifies social-behavioural and security dimensions of trust. By leveraging ToIP as a formal foundation for trust establishment in this work, the proposed framework provides a holistic approach to building and maintaining trust in the market, thereby fostering greater user confidence and encouraging broader market participation.
This study is an exploratory study to understand the role of gender, home ar-ea, and high school area in Willingness to disclose information, Benefits, Privacy Risks, Subjective Norms, and Perceived Behavioural Control. This study used Privacy Calculus and the Theory of Planned Behaviour to formu-late a five-construct questionnaire. An empirical data sample of 133 first-year IT students at Nelson Mandela University was collected. The results of the study indi-cate that males are more likely to disclose information on Facebook com-pared to females, and they perceive more benefits. Additionally, males are more likely to be influenced by others to disclose information compared to females. Surprisingly, Privacy Risks and Perceived Behavioural Control have no significant impact on gender. Secondly, students in a Rural area are more willing to disclose their personal information on Facebook compared to those in an Urban or Suburban area, they perceive more benefits compared to those in an Urban or Suburban area, and students in an Urban or Suburban area are more aware of privacy risks compared to students in a Rural area. Social Norms and Perceived Behavioural Control have no significant impact on the home area. Lastly, students from Rural high schools perceive more benefits, and they are more likely to be influenced by others compared to those in Urban or Suburban areas. Willingness to disclose information, Pri-vacy Risks and Perceived Behavioural Control have no significant impact on the high school area.
Engaging with International Cyber Forums
As digital economies scale, the hidden environmental cost of data processing—especially from AI and search engines—has become a growing concern. Search engines like Google and AI models such as Meta AI consume hundreds of kilowatt-hours (kWh) daily; in 2024, Google disclosed that each AI search can consume up to 3 watt-hours, which, at scale, parallels the energy of running a home microwave for 20–30 seconds per query. These figures point to a pressing need to rethink our computational architectures.
We propose a novel hybrid model that combines Quantum Reservoir Computing (QRC) with Principal Component Analysis (PCA) as a means to reduce computational load while maintaining high-performance intelligence. This approach leverages quantum dynamics for memory-rich processing while applying PCA to filter and compress high-dimensional outputs, minimizing redundancy and noise. The integration is particularly designed for High-Performance Computing (HPC) tasks such as indexing, ranking, and personalization within large-scale search engines.
Previous research in QRC has highlighted its potential for temporal processing, but it remains underutilized in real-world, energy-intensive infrastructures. Most prior work applies QRC in small-scale simulations without dimensionality reduction or power profiling. Our method introduces PCA post-processing as a compression lens—a missing piece in current quantum reservoir computing literature.
Quantum circuits can be represented using complex unitary matrices. Certain algorithms, such as Shor's factoring algorithm, produce a matrix that needs to be converted into a quantum circuit and executed multiple times as part of a larger quantum circuit. Current approaches use the mathematical properties of matrices to factor arbitrary matrices into a different set of matrices that must then be converted to quantum circuits. As such, the available basis gates of a specific quantum computer are not considered during the process. These quantum circuits cannot be directly implemented on quantum computers and require a transpilation step, where the produced quantum circuit needs to be converted to a quantum circuit which can run on a specific quantum computer. The transpilation greatly increases the number of gates used on the quantum computer, which increases the execution time needed on quantum hardware, and increases the noise observed during experiments. This study proposes a novel approach to convert complex unitary matrices into quantum circuits, while also minimising the number of gates used by the quantum computer. The proposed approach utilises a game tree, where the basis gates for a specific quantum computer are used to ensure that an optimal solution is found. The process of converting an arbitrary matrix to a quantum circuit can be modelled by storing the matrix representation of a quantum circuit and then adding new gates one at a time and recalculating the matrix representation. These matrices can be thought of as states in a game tree. At each state in the game tree, the valid moves are all the basis gates for a given quantum computer. The goal matrix can then be found by searching the generated game tree for a state with the same matrix representation as the goal matrix, and the corresponding path of the tree will correspond to the gates which produce the quantum circuit. This study investigates the generation and traversal of such quantum game trees. This includes efficient matrix storage in the game tree coupled with compression algorithms, as well as the accuracy functions necessary to search the game tree for the desired matrix.
Optimization problems appear widely in science and industry, yet their classical solutions often demand considerable computational resources. Quantum computing provides a promising framework for addressing such problems more efficiently by exploiting quantum superposition and entanglement [1]. In this work, we investigate several quantum gradient descent [2] approaches to find the minimum of a quadratic cost function. Performing the implementation through Amplitude encoding, we begin by a quantum gradient descent algorithm with a phase estimation-based method. To further enhance performance, we develop and test additional strategies, including linear combination of unitaries (LCUs) [3], the Sz.-Nagy dilation method [4], and a so-called unitary selection method, where the cost function is explicitly defined as a quadratic function. These methods are evaluated in terms of circuit depth, number of iterations, and accuracy. Our results show that the unitary selection outperforms phase estimation, LCUs provide a further improvement, and the Sz.-Nagy approach achieves the highest efficiency among all tested methods. This comparative study highlights the potential of pure quantum algorithms in solving real-world quadratic optimization problems.
[1] Nielsen, M. A., & Chuang, I. L., Quantum Computation and Quantum Information (10th Anniversary Edition, 2010), Cambridge University Press.
[2] Rebentrost, P., Schuld, M., Wossnig, L., Petruccione, F., and Lloyd, S., Quantum gradient descent and Newton’s method for constrained polynomial optimization, New J. Phys., 21(7):073023, (2019).
[3] Chakraborty, Shantanav. "Implementing any linear combination of unitaries on intermediate-term quantum computers." Quantum 8 (2024): 1496.
[4] Gaikwad, Akshay, Arvind, and Kavita Dorai. "Simulating open quantum dynamics on an NMR quantum processor using the Sz.-Nagy dilation algorithm." Physical Review A 106.2 (2022): 022424.
Image reconstruction is a critical problem in industry, especially in certain areas of Optics, such as the Ghost Imaging experiment, [1], [2]. The experiment has many beneficial practical applications such as live cell imaging or remote sensing. The key leverage here lies with its non-local imaging procedure. This allows one to view a quantum image without collapsing its state. The experimental
approach requires twice the number of measurements as opposed to a classical
image, due to the real and complex part of the quantum image. Thus, requires
≈ 2N 2 measurements to reconstruct a N × N image, [3]. The experimental
procedure has challenges in the speed and fidelity of reconstruction. Commonly
used classical reconstruction methods are effective but can be computationally
intensive or struggle to leverage the inherent patterns in natural images.
We have designed a Classical and Quantum algorithm to overcome this intensive computational task. The method we present reconstructs low-sampled
images measured from the Ghost Imaging experiment, using a Classical and
Quantum Convolutional Neural Network (CNN), [4]. Low-sampled images have
a linear representation using the Hadamard transform, where a number of co-
efficients of the linear decomposition are unknown. The CNN’s take the low-
sampled coefficients as inputs and reconstructs the complete set of coefficients.
Instead of directly processing pixel-domain images, our method focuses on re-
constructing missing coefficients in the Hadamard transform domain.
The Quantum CNN model architecture adapts the principles of a Classic
U-Net Convolutional Neural Network. With the use of Variational Circuits, we
will apply the convolutional and pooling layers. Due to quantum properties such
as quantum superposition and quantum entanglement, the model may be able
to exploit more intrinsic patterns and correlation within the Hadamard coeffi-
cient space. We have simulated the Quantum CNN, and seems to show possible
improvement in reconstruction speed and higher fidelity rates, as compared to
its classical counterpart of similar size.
1
This paper will detail the proposed Classical annd Quantum CNN archi-
tecture, the encoding scheme for Hadamard coefficients into quantum states,
the variational quantum layers for feature extraction and upsampling, and the
classical optimization loop. We will present simulation results on the MNIST
data set, and real experimental results from the Wits Structured Light Lab.
Demonstrating the CNN’s ability to reconstruct full Hadamard coefficient sets
from various levels of undersampling, followed by inverse Convolutional Neural
Network to generate high-fidelity pixel-domain images. The findings highlight
the potential of quantum machine learning to significantly advance computa-
tional imaging techniques like Ghost Imaging, paving the way for faster, more
accurate, and quantum-enhanced imaging solutions.
Intermolecular interactions play a fundamentally important role in the properties of solid materials. For instance, molecules ("guests") are taken up into porous materials ("hosts") as a result of the interactions between these species, while the manner in which they interact has an influence on the sorption ability of the porous material. Several examples from our work will be used to show that calculations performed using the CHPC's computational facility allow us to explain the role that intermolecular interactions play in the unusual sorption properties of various porous compounds. For instance, the formation of hydrogen bonds between a porous material and guest H2O molecules can allow the water to behave like a liquid down to temperatures as low as –70 C.[1] Molecular dynamics calculations and in combination with simulation of sorption isotherms using the BioVia MaterialsStudio suite available through the CHPC aided in this understanding, and can hence be used to identify materials that will yield superior water harvesting materials.
In this presentation, several examples from our work will be used to show that computational methods allow us to explain the role that intermolecular interactions play in stabilising various chemical systems. In particular, I will focus on how interactions between water, CO2 and other solvents influence the properties of porous materials, and how we have used the resources made available through the CHPC to undertake these studies.
[1] Eaby, A.C., Myburgh, D.C., Kosimov, A., Kwit, M., Esterhuysen C., Janiak, A. M., Barbour, L. J. (2023) Nature, 616, 288–295.
Heavy rainfall events are among the most damaging weather hazards worldwide, yet they remain difficult to simulate accurately. One key source of uncertainty is the choice of input data used to initialize weather and climate models. In this study, we tested how sensitive the Conformal Cubic Atmospheric Model (CCAM) is to different initialization datasets, including ERA5, GFS, GDAS, and JRA-3Q. Using the CHPC Lengau cluster, we ran high-resolution (3 km) convection-permitting simulations, which allowed us to capture the fine-scale features of a 3-4 June 2024 heavy rainfall event over the eastern parts of South Africa.
We evaluated the simulations against radar and IMERG satellite precipitation estimates. While all runs reproduced the evening peak in rainfall timing, they generally underestimated intensity. Among the datasets, ERA5 produced the most reliable simulations, showing the closest match to IMERG with the lowest errors and highest correlation. In contrast, JRA-3Q and GFS-FNL performed less well. These results show that the choice of initialization dataset has a clear impact on rainfall prediction skill, and highlight the value of HPC-enabled sensitivity studies for improving extreme weather forecasting in the region.
Running large-scale bioinformatics analyses on high-performance computing (HPC) infrastructure like the CHPC can significantly accelerate research, but comes with technical challenges—especially for researchers aiming to deploy complex workflows such as those built with Nextflow. In this talk, I present practical recommendations and lessons learned from testing and running various bioinformatics applications on the CHPC, with a particular focus on containerised workflows and resource optimisation.
Drawing from real-world use cases and performance benchmarks, I highlight key considerations such as managing limited walltime, dealing with module and environment setup, optimising Singularity containers for reproducibility, and handling input/output bottlenecks. I also reflect on common pitfalls and how to overcome them—especially for researchers with limited systems administration experience.
This presentation aims to equip bioinformatics users with actionable guidance on how to run workflows more efficiently, reproducibly, and with fewer frustrations on the CHPC infrastructure. It is also a call for continued collaboration between HPC support teams and domain researchers to bridge the gap between computational capacity and research usability.
Room-temperature ionic liquids (ILs) are molten salts with negligible vapour pressure and wide electrochemical windows, making them attractive electrolytes for beyond-lithium batteries [1]. Optimising transport properties—such as conductivity, self-diffusion, and the working-ion transference number (the fraction of the total ionic current carried by Li⁺/Na⁺/K⁺ from the added salt)—requires further quantitative, molecular-scale insight into how charge and mass move. Equilibrium molecular dynamics (MD) provides this insight by enabling transport coefficients and mechanistic signatures to be extracted from atomistic simulations. The rate capability of a battery is tightly coupled to the transport properties of the electrolyte; formulations that raise the working-ion transference number while maintaining adequate conductivity are preferred [2].
In this work, MD simulations were used to probe 1-butyl-1-methylpyrrolidinium bis(fluorosulfonyl)imide ([C₄C₁pyr][FSI]) mixed with MFSI (M = Li, Na, K) at salt mole fractions of 0.10, 0.20, 0.40 and T = 348.15 K. A non-polarisable model based on the well-established CL&P force field was employed [3]; however, non-bonded interaction parameters were adjusted to better reflect symmetry-adapted perturbation theory (SAPT) decomposition of pairwise interactions, including cation–anion and metal-salt pairs. Equilibrium trajectories of ≥250 ns per state point were generated with LAMMPS [4]. Self-diffusion coefficients were obtained from Einstein mean-squared displacements, and ionic conductivity was computed using the Green–Kubo/Einstein–Helfand formulation. The analysis includes Nernst–Einstein estimates of conductivity ($\sigma_\text{NE}$), Haven ratios ($\sigma_\text{NE}/\sigma$) and its inverse (ionicity, $\sigma/\sigma_\text{NE}$), and both apparent transference numbers (from self-diffusion coefficients) and real/collective transference numbers from conductivity decomposition in an Onsager framework. Mechanisms of ion transport are examined via Van Hove correlation functions (self and distinct), the non-Gaussian parameter, ion–anion residence times, and coordination numbers. Hole (free-volume) theory is evaluated as a compact model for conductivity across composition.
HPC Content: Strong scaling was assessed for fixed-size systems of 256, 512, and 1024 ion pairs on 1–64 CPU cores (MPI ranks); wall-time per ns and ns/day were recorded to determine speedup and parallel efficiency. For one representative state point, transport properties are compared across these system sizes to illustrate finite-size effects.
[1] Yang, Q.; Zhang, Z.; Sun, X.-G.; Hu, Y.-S.; Xing, H.; Dai, S., Ionic liquids and derived materials for lithium and sodium batteries. Chem. Soc. Rev. 2018, 47, 2020-2064.
[2] Chen, Z.; Danilov, D. L.; Eichel, R.-A.; Notten, P. H. L., Porous Electrode Modeling and its Applications to Li-Ion Batteries. Adv. Energy Mater. 2022, 12, 2201506.
[3] Canongia Lopes, J. N.; Pádua, A. A. H., CL&P: A generic and systematic force field for ionic liquids modeling. Theor. Chem. Acc. 2012, 131, 1-11.
[4] Thompson, A. P.; Aktulga, H. M.; Berger, R.; Bolintineanu, D. S.; Brown, W. M.; Crozier, P. S.; in 't Veld, P. J.; Kohlmeyer, A.; Moore, S. G.; Nguyen, T. D.; Shan, R.; Stevens, M. J.; Tranchida, J.; Trott, C.; Plimpton, S. J., LAMMPS - a flexible simulation tool for particle-based materials modeling at the atomic, meso, and continuum scales. Comput. Phys. Commun. 2022, 271, 108171.
Q&A
Discover how servers with Intel Xeon 6 processors revolutionize AI inference across your entire infrastructure. This executive briefing explores the compelling business case for modernizing your compute environment, featuring breakthrough TCO improvements, enhanced sustainability metrics, and seamless AI workload deployment from edge to cloud. Learn how forward-thinking organizations are leveraging this powerful combination to accelerate datacentre refresh cycles while reducing operational costs and environmental impact. Join us for insights on strategic planning, investment optimization, and competitive advantages that drive digital transformation success.
Traditional HPC systems offer various levels of job isolation, including secure RDMA enclaves, but often assume a ‘friendly neighbor’ shared batch system environment outside of that. Creating end-to-end attested workflows on them is still a novel development. Trusted research environments instead are often built as Kubernetes clusters, on the other hand, and offer more isolated execution environments, but network separation typically ends at the VLAN level.
We present a method to execute workloads in an attested environment using RDMA and IP network separation at the linux namespace level on HPE Slingshot in a setup where K8s is used to run elastic inference workloads."
The current paradigm in HPC for AI has shifted from simple "data parallelism" to multi-dimensional (3D) parallelism and heterogeneous co-execution. In this talk, we will discuss optimization and placement patterns for training and inference workloads. For Training, we will focus on topology-aware placement that minimizes inter-node communication latency using 3D parallelism. For inference, We will also focus on maximizing throughput per watt and utilizing "stranded" capacity via hybrid CPU-GPU pipelining and dynamic model partitioning (e.g., Multi-Instance GPU or MIG). We will then demonstrate how these placement strategies can be used to harness the power of HPC in AI workloads by applying 3D parallelism and heterogeneous co-execution.
This talk explores how VAST enables modern HPC environments to operate as a true shared services centre of excellence. By combining multi-tenancy with predictable QoS, VAST enables research groups and diverse workloads to coexist on a single platform without performance conflicts. The result is a simplified, scalable HPC architecture that delivers fairness, consistency, and cloud-like service delivery for HPC and AI users alike.
The increasing reliance on the Internet has exposed the triad of cybersecurity, namely, people, processes, and technology, to various cyberattacks. Moreover, factors such as the Coronavirus Disease 2019 (COVID-19) pandemic have con-tributed to the rise in these attacks. Students, who are heavy Internet users, are not immune to these cyberattacks. Several studies have identified students as primary targets of cyberattacks at Higher Education Institutions (HEIs), largely due to their limited knowledge of how to protect themselves online. As a result, this study investigated the Cybersecurity Awareness (CSA) of students at a South African public HEI. The objective was to gauge students' knowledge of cybersecurity. The study employed a quantitative research design, using a structured questionnaire administered through QuestionPro. The sample consisted of students from seven faculties. The instrument assessed students' awareness across several cybersecurity topics: phishing, antivirus, identity theft, cyberbullying, piracy, password security, and malware. A total of 381 responses were collected and subjected to statistical analysis. The findings indicated that while students demonstrated strong awareness in identity theft and cyberbullying, significant deficiencies were observed in phishing, password management, and antivirus usage.
South Africa faces a critical shortage of cybersecurity professionals, with both industry and academia recognising the gap as a driver of national cyber risk. This paper examines the nature of the cybersecurity skills gaps by reviewing recent studies and presenting the findings of a recent survey. Findings confirm that employers expect graduates to transition seamlessly into the workforce, highlighting the importance of practical, hands-on training, embedded certification pathways, and curricula aligned with market demands. Students similarly seek a relevant qualification to enhance their careers, offer hands-on learning, embed certifications, and provide flexible and affordable delivery. To address these needs, the study evaluates two international cybersecurity knowledge frameworks, ACM Cybersecurity Curricula Guidelines 2017 (CSEC2017) and the Cyber Security Body of Knowledge (CyBok). Adopting and localising these frameworks can guide the development of a South African cybersecurity postgraduate qualification that is globally benchmarked yet tailored to local threats, legislation, and workforce requirements. By bridging the gap between academic preparation and industry expectations, the study aims to enhance the nation's capacity to respond to evolving cybersecurity challenges.
Organisations in developing countries face uneven cybersecurity readiness shaped by national laws, institutional capacity and market conditions. This paper proposes a Holistic Cybersecurity Readiness Assessment (CSRA) that links an External Cybersecurity Environment Assessment (Tier 1) with an Internal Readiness Self-Assessment (Tier 2) across governance, people, pro-cess and technology. The tool integrates a structured review of standards and national initiatives with practitioner input to derive profiling dimensions and factors of relevance. The paper presents the instrument design, scoring approach, and reports an initial pilot to examine face and content validity The contribution is a practicable method that aligns organisational control priorities with the external context so that improvement plans are defensible and se-quenced. Early findings suggest the two-tier linkage clarifies dependencies be-tween regulation, capacity and internal practices. The paper concludes with limitations and next steps for broader validation across multiple national settings.
Ramsey Numbers are a computationally difficult problem to solve. The expected runtime of any algorithm to find a Ramsey Number is in the computational complexity class of $\Pi_2^P$ or $\text{co-NP}^\text{NP}$ (Burr, 1987). Here we present some preliminary results from an optimized tree-search algorithm to find the next Ramsey Number $R(4,6)$ (Radziszowski, 2024) and verify the result $R(5,5)=45$ (Tamburini, 2025) using modern parallelisation techniques and improved hardware. We provide an analysis on the efficiency of this parallel algorithm compared to other implementations. We present current progress on generalising the algorithm to find the Ramsey Numbers for general associated structures.
The computation of Ramsey Numbers in graph theory looks for the appearance
of order of a certain substructure in a graph of given size. Mathematically, the
calculation of a Ramsey Number R(k, l) = n is a two colouring problem that
finds the smallest graph of size n, that contains either a colouring of size k
or a different colouring of size l, [1]. This is a formidable computational challenge. Classical algorithms face a search space that grows super-exponentially
with the number of vertices, rendering the problem intractable. This abstract
presents an approach to utilizing Quantum Optimisation Algorithms to address
this complexity, with an experimental implementation targeting IBMQ quan-
tum hardware.
The following paper, [2], reformulates the problem of determining if R(k, l) > n
(i.e., if an n-vertex graph exists with no k-clique or l-independent set) into
a Quadratic Unconstrained Binary Optimisation (QUBO) problem. The as-
sociated Problem Hamiltonian, HP , is constructed such that its ground state
corresponds to a solution that satisfies our decision problem.
We employ the Variational Algorithm, a leading hybrid quantum-classical method.
The circuit is implemented using the Qiskit framework and executed on acces-
sible IBMQ systems. A key aspect of our work is the introduction of quantum
approaches in this field and execution on Utility scale IBMQ architecture.
To our knowledge, the following paper, [3], solves R(5, 5) = 45 with Majorana
based Algebra on a Photonic Quantum Computer, using only 5 qubits.
We have verified classical results for the computation of small, yet non-trivial,
Ramsey numbers, such as R(3, 3), by benchmarking the performance classical
optimization. We would like to investigate the scale-up performance and qual-
ity of results on Utility scale quantum computers. Our findings will contribute
to the knowledge of solving problems beyond the reach of conventional High
Performance Computing (HPC) resources.
[1] - Bondy, J.A., and P. Erd¨os. “Ramsey numbers for cycles in graphs.” Journal
of Combinatorial Theory, Series B, vol. 14, no. 1, 1973, pp. 46–54. Crossref,
https://doi.org/10.1016/S0095-8956(73)80005-X
[2] - Wang, Hefeng. “Determining Ramsey Numbers on a Quantum Computer.”
Physical Review A, vol. 93, no. 3, Mar. 2016. Crossref, https://doi.org/10.1103/physreva.93.032301
[3] - Tamburini, Fabrizio. “Random-projector quantum diagnostics of Ramsey
numbers and a prime-factor heuristic for R(5,5)=45.” arXiv, 2025. arXiv:2508.16699.
This project focuses on identifying quantum hardware based on its unique "quantum
noise fingerprint" using machine learning. Each quantum computer exhibits a distinct
noise signature due to physical imperfections, and recognizing these patterns can aid in
hardware development, calibration, and security. We utilized basic machine learning
algorithms (SVM, KNN) to analyse noise characteristics and predict which IBM quantum
machine executed a given circuit.
Methodology and Observations
Data was gathered from IBM's Qiskit platform, including actual hardware runs (facilitated
by a CSIR educational license) and refreshed software simulations. An HPC cluster was
essential for processing and simulating the extensive datasets due to the computational
demands, allowing for efficient parallel data transformation. The SVM and KNN machine
learning models were then trained on this data, after feature engineering and parameter
tuning was completed. Initial findings showed high accuracy (over 96%) when models
were trained and tested on data within the same category (e.g., training on hardware data
and testing on hardware data). However, a significant drop in accuracy was observed
when attempting to identify machines across different data types (e.g., training on
software simulations and testing on actual hardware). Furthermore, we noted that IBM's
refreshed simulation noise models are not static and evolve over time
The rise of quantum computing poses a serious threat to password-based security systems
and could break the methods we currently use to keep data safe, putting sensitive information
at risk. For example, Grover’s algorithm, a well-known quantum algorithm can make bruteforce password attacks much faster by reducing the number of guesses needed roughly by the
square root of the total number of possible passwords, which could result in attacks being
thousands of times faster for large key spaces.
This research proposes a Password Security Quantum Readiness Framework to help IT
professionals maintain business continuity in the face of sudden quantum-driven password
security shifts. The study aims to assess the risk that quantum computing poses to password
security, evaluate countermeasures including quantum-resistant hashing, multi-factor or
password-less authentication, upgrading hashing protocols to post-quantum standards and
other protections to mitigate these risks.-
A qualitative-methods design supports the study. First, a thorough literature review will be
conducted to investigate the password security risk posed by quantum computing. Second, a
systematic literature will be conducted to investigate possible counter measures for mitigating
password security risk related to quantum computing. Third, Critical Reasoning will be used
identify and extract key constructs for formulating the framework.
Businesses can protect sensitive information from emerging quantum technologies by
developing a quantum readiness framework for password security. This framework will help IT
professionals understand the risks posed by quantum computing and equip them to address
password cybersecurity challenges, creating a business-continuity architecture to safeguard
password infrastructure and ensure operational resilience in the evolving quantum landscape.
Supercomputing for Sustainability: Balancing Performance and Energy
High-performance computing and AI are at the heart of modern cyber-infrastructure, enabling the transformation of massive data sets into knowledge and decisions. Yet, as system scale and complexity grow, so do the challenges of energy consumption, sustainability, and efficient data movement. This BOF will explore strategies to balance performance with energy efficiency in large-scale systems while ensuring that scientific computing remains productive and impactful.
Key discussion points include how future HPC and AI infrastructures can be designed and operated to reduce energy demand, how infrastructure choices affect sustainability, and how new approaches in scheduling, data management, architectures, and workflow design can align scientific progress with environmental responsibility. By bringing together several perspectives, the session aims to identify practical directions for sustainable supercomputing that can meet the dual challenge of handling ever-larger data sets while supporting informed decisions for science and society.
Welcome and Moderation
Maximilian Höb, Leibniz Supercomputing Centre
Lightning Talks
Ian Foster, University of Chicago
Utz-Uwe Haus, Hewlett Packard Enterprise
Dieter Kranzlmüller, Leibniz Supercomputing Centre
Dan Stanzione, Texas Advanced Computing Center
Panel Discussion with all Speakers
Industry analyst Dan Olds will discuss this issue and lay out his views on what he sees as an inane question. Expect some sarcasm, overblown rhetoric, and derision in this mercifully short presentation.
This session has become a signature event at CHPC conferences. The rules are brutally simple. Vendors have five minutes and only three slides to put their best foot forward to the audience and the inquisitors. The panel includes industry analyst Dan Olds along with two standout students from the cluster competition who have been briefed on the vendors and their slides.
After their five-minute presentations, the presenters will be asked three questions, two of which they know are coming followed by a final, secret, question. Frank and tough questions will be asked. Answers will be given. Punches will not be pulled. The audience will be the ultimate judge of which vendor did the best job. It’s fun, brisk, and informative.
Datacentres have evolved to maximize the GenAI wave, and now Agentic AI is driving an even bigger compute shift, with a massive heterogeneity of compute capabilities in the datacentres (CPU, GPU, accelerators, …). Data storage is key. Security of data is paramount. Processing capabilities still have to step up. But data storage and security at rest are not enough, as I/O pressure of heterogeneous workflows, multi-tenant usage of resources, and integration of highly differentiated operational compute services into previously dedicated systems becomes the target for optimizing TCO. Learn how Intel’s architecture and technologies, how HPE solutions and AI expertise will capture this next wave of Agentic AI and deliver customer value.
Research software underpins nearly every aspect of modern science, from data processing and simulation to modelling, visualisation, and workflow automation. In South Africa, as around the world, HPC users and administrators play a substantial role in developing and maintaining the research software that drives the models, simulations, and workflows behind scientific discoveries. Unfortunately, few mechanisms exist to incentivise and recognise these contributions, leaving much of the foundational work behind research software undervalued and overlooked.
To address this gap, the UCT eResearch Centre initiated and designed the framework for a new National Science and Technology Forum (NSTF) award category, working in partnership with the University of the Western Cape, South African Centre for Digital Language Resources (SADiLaR) and the NSTF. The introduction of this category, launching in 2025/2026, has been made possible through financial support from SADiLaR and will recognise exceptional contributions to science, engineering, technology and innovation enabled by research software.
This Keynote presentation will be focused on the following:
The readiness of SADC NRENs to the challenges of Petascale and Exascale data to be generated by key projects such as, e.g., SKA and the Bioinformatics Genome Sequencing projects.
Strategic infrastructure investments and partnerships and how to maximise on private sector investments.
Data sovereignty and developing the right skills for our research communities to be able to handle and run these increasing scientific data flows.
In today’s rapidly evolving digital landscape, robust national cyber infrastructure is essential for driving innovation, securing critical systems, and empowering research across all sectors. This keynote explores how the strategic integration of advanced compute power and big data capabilities forms the backbone of modern cyber infrastructure, enabling nations to tackle complex challenges in science, engineering, and industry. We will highlight MathWorks’ pivotal role in supporting these efforts by delivering state-of-the-art technical tools, such as MATLAB and Simulink, that accelerate data analysis, modeling, and simulation at scale. Beyond technology, MathWorks is committed to capacity building—offering comprehensive training programs for staff and students to cultivate the next generation of cyber professionals. Furthermore, we foster collaboration by connecting academia, government, and industry, ensuring a vibrant ecosystem where innovative ideas flourish. Join us to discover inspiring case studies and practical strategies that demonstrate how a unified approach to compute, data, and community can unlock the full potential of national cyber infrastructure and drive transformative outcomes.
Climate change challenges & the pressure on the earth crust minerals resources require the discovery of novel & frontier-type compounds. These latter should exhibit diverse multifunctionalities conjugated to a minimum of energy consumption, both during their conception & usage. Because the number of compounds that can be created from the elements in Mendeleev's periodic table is practically endless and not restricted by the 63 elements initially classified, it is impossible to provide an exact number. A single element, such as carbon, can be used to construct molecules of varied lengths, while other elements can be combined in almost infinite ways to generate distinct substances. Elements can also form numerous compounds.
By employing algorithms to evaluate enormous datasets, spot trends, and forecast novel material properties, artificial intelligence (AI) and machine learning (ML) are revolutionising materials discovery and speeding up the process beyond conventional trial-and-error techniques. These technologies speed up the process of finding new materials for a variety of uses by enabling high-throughput screening of possible materials, computational design of materials with specific features, and autonomous experimentation.
Using High Performance Computational Capabilities (HPCC), this contribution reports on a set of examples related the Energy-Water-Health-Food security nexus in line with the U.N. SDGs landscape. This includes (i) the conversion of CO2 to multi-functional nano-scaled carbonates [1] ,(ii) new generation of nanofluid coolants for heat management [2-3], (iii) Smart nanocoatings for green air conditioning [4], and (iv)bio-inspired nanomaterials for water decontamination [5].
[1].” Room temperature bio-engineered multifunctional carbonates for CO2 sequestration and valorization”, M. Maaza et al, https://www.nature.com/articles/s41598-023-42905-5
[2].”Remarkable thermal conductivity enhancement in Ag—decorated graphene
nanocomposites based nanofluid by laser liquid solid interaction in ethylene glycol”
https://www.nature.com/articles/s41598-020-67418-3
[3]. “A novel approach for engineering efficient nanofluids by radiolysis”
https://www.nature.com/articles/s41598-022-14540-z
[4]. “Towards Room Temperature Thermochromic Coatings with controllable NIR-IR
modulation for solar heat management & smart windows applications” , M. Maaza et al,
https://www.nature.com/articles/s41598-024-52021-7
The evolution and progress of humanity are closely linked to our ways of energy use. Reliable energy sources are vital for driving economic growth, especially as society's demand for energy keeps rising. The rapid development of zinc-air batteries (ZABs) makes them an appealing alternative to standard lithium-ion batteries for energy storage needs. However, the slow kinetics of the air cathode led to a short lifespan and low energy efficiency in zinc-air batteries. First-principles calculations help develop catalysts that promote the formation of the most stable discharge products in Zn-air batteries. Density functional theory (DFT) is used to examine the adsorption (Γ= +1, +2) and vacancy formation (Γ= -1, -2) energies of oxygen atoms on the (001) surface of VCo2 O4. The Bader charge analysis reveals how the atoms interact within the system. When oxygen atoms are reduced and adsorbed, it is observed that the V and Co atoms show minimal charge differences compared to the original phase, whether reduced or oxidized. Interplanar distances show that adding or removing oxygen causes the system to expand or contract, respectively. The work function helps assess the system’s reactivity. Absorbing oxygen atoms decreases reactivity, while removing oxygen increases it. The calculations were executed concurrently on 24 of the 2400 available cores, leveraging CHPC with 2048 MB of memory. These findings provide insights into identifying catalysts that can enhance the oxygen reduction reaction (ORR) and oxygen evolution reaction (OER), thereby improving the performance of Zn-air batteries.
This paper explores the benefits of integrating Earth Observation (EO) techniques with Artificial Intelligence (AI) to enhance the capabilities of Numerical Weather Prediction (NWP) models, particularly in the context of severe weather and environmental hazards over South Africa. NWP models often employ EO that lacks real-time resolution, which may lead to increased uncertainty in short-term forecasts and reduced reliability during high-impact weather events. EO systems provide high-resolution, near-real-time observations. AI techniques perform post-processing tasks like bias correction, anomaly detection, and pattern recognition. AI also excels at capturing non-linear relationships and fine-scale phenomena that are often poorly resolved in NWP models. This EO-AI integrated approach should improve model forecast accuracy, and detection of localized hazards. We demonstrate the benefits and shortcomings of our approach in detecting hazards such as heat waves, wildfires and wetland degradation.
Three-dimensional (3D) computational fluid dynamics (CFD) has emerged as a powerful tool for studying cardiovascular haemodynamics and informing the treatment of cardiovascular diseases. Patient-specific CFD models rely on boundary conditions derived from medical imaging, yet uncertainties in imaging measurements can propagate through the model and affect clinically relevant outputs such as pressure and velocity fields. To ensure that CFD-based clinical decisions are both reliable and repeatable, it is essential to quantify these uncertainties and assess the sensitivity of the outputs to boundary condition variability.
Uncertainty quantification and sensitivity analysis (UQ/SA) typically require large numbers of simulations, which makes their application challenging in 3D CFD due to high computational costs. While Monte Carlo approaches may require hundreds of evaluations, alternative methods such as generalized polynomial chaos expansion reduce the number of runs but remain computationally demanding.
In this study, we present a global UQ/SA framework implemented on the Lengau Cluster for coarctation of the aorta, a common form of congenital heart disease. The uncertain inputs are the lumped parameters of the 3-Element Windkessel Model, prescribed at the outlets to represent distal vasculature. We evaluate how variability in these parameters impacts pressure and velocity fields, with the objective of improving the robustness and clinical utility of patient-specific CFD simulations.
Q & A
Water scarcity and non-revenue water remain pressing issues in South Africa, exacerbated by environmental changes. The CSIR's Smart Water and Waste Water Institute (SWWWI), in collaboration with Spectrum Access and Management Innovation (SAMI), proposes an innovative Smart Water Network Management solution to address these challenges. The solution integrates a Dynamic Hydraulic Model (DHM) program with a real-time smart dashboard, leveraging expertise from telecommunications, civil, and software engineering. This enables proactive remote monitoring and control of distributed water networks, optimizing network performance, reducing water loss, and enhancing operational efficiency.
The DHM simulates various network scenarios, enabling predictive maintenance, pressure management, and optimized control valve scheduling. The smart dashboard provides a unified interface for operators to monitor and control the network, facilitating swift response to anomalies and informed decision-making. Testing and validation are underway on the CSIR's Pretoria campus water network, with collaborative support from Magalies Water. Ongoing research and development focus on refining the solution, incorporating advanced analytics, machine learning, and IoT technologies. The Smart Water Network Management System has the potential to revolutionize water management practices in South Africa and reduce non-revenue water through conservative measures and relevant technological capabilities.
Urban areas across South Africa face increasing pressure to plan for sustainable growth and service delivery amid rapid change. The Urban and Regional Dynamics research group at the CSIR develops city- and provincial-level simulation and decision-support tools to assist planners and policymakers in exploring long-term urban development scenarios. These models integrate spatial, economic, and demographic data, requiring significant computational capacity for processing and visualisation. Access to the CHPC infrastructure has been critical in enabling the scalability of these systems and providing reliable, shared access for multiple users across municipalities and provinces. This talk outlines the computing challenges and successes encountered from handling large geospatial datasets to deploying interactive web-based interfaces and presents key research outcomes that demonstrate how cloud-based systems can strengthen data-driven urban planning and decision-making.
The National Policy Data Observatory (NPDO) is a government-led initiative housed at the CSIR to strengthen evidence-based policymaking through advanced data analytics and national cyber-infrastructure. Established during the Covid-19 pandemic, the NPDO has expanded into a multidisciplinary hub delivering insights on socio-economic trends, mobility, disaster management, energy use, sentiment analysis and development indicators. Its growing portfolio now supports emerging priorities such as crime prediction, water quality monitoring, digital census innovation and service-delivery improvement. Through strategic collaborations across government and research partners, the NPDO is becoming a central platform for integrated, data-driven decision support in South Africa.
This talk is focused on providing an overview of the Artificial Intelligence, Data Science and Natural Language Processing fields. The audience will be given a view of use cases and solutions that are being worked on at CSIR. This will be followed by an overview of the impact that CHPC infrastructure has contributed to our NLP initiatives. Lastly, we will briefly share perspectives to enable future success at the intersection of infrastructure and AI innovation.
This paper explores the ethical boundaries of plagiarism detection in the age of artificial intelligence (AI), focusing on the rise of AI-generated text and its implications for academic integrity. While plagiarism detection has traditionally relied on string-matching and authorship attribution, the emergence of generative models like GPT-4 challenges these methods. Institutions now face a dual imperative: uphold fairness and accountability while respecting privacy, transparency, and due process. This article reviews the evolution of detection systems, contextualizes them within cybersecurity frameworks, and analyzes ethical tensions through global policy comparisons and case scenarios. A conceptual model of AI-enabled detection is presented, alongside a comparative table of international data protection laws. The paper argues for a balanced governance approach that integrates human judgment, safeguards student rights, and acknowledges cultural diversity in plagiarism norms. Recommendations include hybrid detection-prevention strategies, transparent algorithms, and ethics-informed policy design. Ultimately, the goal is to ensure that detection systems serve education rather than undermine it.
Generative Artificial Intelligence has rapidly evolved, transforming numerous sectors. While its potential benefits are widely acknowledged, there are growing concerns about its ethical and societal implications. This paper presents a semi-systematic literature review aimed at identifying ethical principles and the social impacts of generative AI. This article synthesises key themes related to fairness, accountability, transparency, privacy, and plagiarism, as well as broader societal concerns including job displacement, misinformation, overreliance and digital divide. The findings contribute to ongoing debates in AI governance, policy formulation, and ethical AI design, and offer a foundation for future research in managing the risks and reducing societal impact of generative AI.
Cybercrime poses a significant threat that presents a unique challenge to law enforcement agencies. The transnational nature and technical complexity are some of the challenges that create difficulties in investigating cybercrime, despite the advancements in legislation. An Interpol 2025 African Cyberthreat Assessment Report highlights that cyber-related offences continue to increase, while countries report critical gaps in investigative capacity. In response to these challenges, this paper proposes a conceptual model based on seven Critical Success Factors (CSFs) identified from reviewing the academic literature and using a snowballing technique. The resulting model provides a structured framework to support law enforcement agencies in improving their cybercrime
Strategic infrastructure investments and partnerships and how to maximise private sector investments
Strategic Infrastructure and Partnerships for the Challenges of Petabyte/Exa-Scale Data and High-Speed Transfers in the SADC Region and Beyond
Empowering Research Communities - Data Sovereignty and Skills Development for Managing Data
Strategic Infrastructure Investments and Partnerships, and How to Maximise on Private Sector Investments
Digital transformation in the public sector needs more than just technology; it requires a new understanding of leadership. This presentation examines how leadership is practised within South Africa’s Centre for High-Performance Computing (CHPC), a national facility driving the country’s digital research agenda.
Building on the Leadership-as-Practice (L-A-P) framework and expanded through the Practice Nexus and Contextual Modulators, this study explores leadership as a collective, relational, and materially mediated activity. Using a qualitative phenomenological case study, it examines how leadership arises through dialogue, improvisation, and the interaction between human and AI actors.
The research introduces the concept of bricolage leadership, developed within the context of institutional constraints and technological complexity.
Findings show how AI tools serve as co-constitutive agents, shaping coordination and sensemaking within national digital infrastructures
Digital leadership is crucial for driving innovation and transformation in health systems across Africa. This case study examines the implementation of drone technology for delivering medical supplies in Rwanda, demonstrating how strategic digital leadership facilitated the successful integration of this technology into the country's health logistics system. Through interviews and document analysis, the study explores how leaders at the Ministry of Health, Rwanda in collaboration with private partners (Zipline, Rwanda), fostered a culture of digital readiness, agility and data-driven decision-making. The findings reveal that visionary leadership, cross-sectoral collaboration and adaptive governance were key to scaling up drone-based delivery networks that now supply remote hospitals with life-saving medical products. The study concludes that Rwanda’s experience demonstrates how digital leadership can overcome infrastructure limitations, promote health equity, and accelerate digital transformation in areas with limited resources.
Sovereign cloud is an increasingly important topic. Nations and businesses are realising that cloud solutions provided by foreign companies, even when deployed in South Africa, are subject to foreign laws. These laws enable states compel these companies to provide data from users of cloud platforms regardless of where they are deployed. In addition, geopolitics has become erratic and vindictive. Trade barriers are erected and removed at a whim creating substantial uncertainty. This is unstable ground on which to build infrastructure of national importance. NICIS, as a provider of cyberinfrastructure, has experience localising technology and building services for the research community. This talk explores NICIS’ efforts to address the lack of a sovereign cloud platform in South Africa and the progress that has been made.
Digital leadership is emerging as the decisive competence of our time — the ability to align human insight, computational capacity, and organisational purpose in a world shaped by. This session explores what it means to lead when cognition, creativity, and computation are increasingly interwoven.
Far from being a technical role, digital leadership represents a new epistemic orientation — one that combines systems intelligence, ethical discernment, and strategic agility to navigate the accelerating feedback loops between human decision-making and machine learning. It requires leaders to cultivate literacies that span from data ethics and digital inclusion to the responsible deployment of AI and HPC infrastructures.
Drawing from applied research and practice at the UWC CoLab for e-Inclusion and Social Innovation and the Samsung-funded Future-Innovation Lab, the presentation will examine how digital leadership is being developed within South Africa’s higher-education and innovation ecosystems. It will illustrate how next-generation leaders are being prepared to operate at the interface of human capability development, institutional transformation, and computational scale — where digital foresight becomes a form of national competence.
Ultimately, the session argues that digital leadership is not about mastering technology but about shaping the conditions under which technology serves human and societal flourishing in the era of artificial intelligence.
Q & A
This presentation chronicles the journey of an older-generation computational chemist and a young HPC expert, mediated via AI assistance, culminating in the successful deployment of a multi-node research computing cluster. The cluster supports molecular modeling and drug design, enabling large-scale molecular dynamics and quantum chemistry calculations. The senior researcher's four-decade arc—from 1983 punch cards to 2025 AI-collaborative infrastructure—illuminates artificial intelligence's role in transforming scientific knowledge transfer.
An initial Gaussian software request expanded into comprehensive cluster setup: Rocky Linux 9, Slurm management, parallel filesystems, and seven key packages (Gaussian, ORCA, GAMESS-US, Psi4, NWChem, CP2K and AMBER). This optimized mixed GPU architectures (RTX A4000/RTX 4060) — a common reality in most laboratories (perhaps fortunately for the average researcher), though uniform hardware is preferable if affordable. Benchmarks yielded 85% parallel efficiency, affirming production readiness.
The AI approach thrived despite hands-off administration, via an iterative model of problem-solving, explanation, and reasoning. Complementary tools — Claude AI for documentation, Grok for perspectives, DeepSeek for verification — fostered rapid consensus, with human-led execution, validation, and adaptation essential. This erodes barriers to retraining or consultancy, enabling expertise assimilation for resource-limited institutions and heralding a paradigm shift in scientific knowledge application.
Keywords: High-Performance Computing, Computational Chemistry, AI-Assisted Infrastructure, Cluster Computing, Knowledge Transfer, Slurm Workload Manager, Scientific Computing, Human-AI Collaboration, HPC Democratization, Intergenerational Learning.
Africa has traditionally lagged behind in life sciences research due to limited funding, infrastructure, and human capacity. Yet the growth of genomics and other large-scale data-driven projects now demands robust cyber-infrastructure for data storage, processing, and sharing. H3ABioNet made a significant contribution to building bioinformatics capacity across Africa over 12 years, but its funding has ended. In 2024, the community received a major boost with support from the Wellcome Trust and Chan Zuckerberg Initiative to establish the African Bioinformatics Institute (ABI).
The ABI is being developed as a distributed network of African institutions, with a mandate to coordinate bioinformatics infrastructure, research, and training. A central focus is on enabling African scientists and public health institutes to manage and analyse large, complex datasets generated by initiatives such as national genome projects, pathogen genomics surveillance, and the African Population Cohorts Consortium. To meet these needs, the ABI is working with global partners, including the GA4GH, to promote adoption of international standards and tools that enable secure, responsible data sharing.
The Institute will coordinate the development of a federated network of trusted research environments (TREs), ensuring data governance frameworks are locally appropriate while interoperable with global systems. By hosting African databases and resources, and fostering collaborations across institutions, the ABI will both drive demand for advanced compute and storage solutions and contribute to shaping how cyber-infrastructure supports genomics on the continent. In doing so, it will bridge local and global research ecosystems and advance the responsible use of genomic data for health impact.
Direct-current (DC) electric arc furnaces are used extensively in the recycling of steel as well as primary production of many industrial commodities such as ferrochromium, titanium dioxide, cobalt, and platinum group metals. This typically involves a process called carbothermic smelting, in which raw materials are reacted with a carbon-based reductant such as metallurgical coke to make the desired product. Although it is one of humanity’s oldest and most established technologies, carbothermic metal production is becoming increasingly unattractive due to its significant scope-1 emissions of carbon dioxide and other environmental pollutants. Because of this many alternatives to fossil carbon reductants are currently being researched, and in the context of broad initiatives to establish a sustainable hydrogen economy both in South Africa and internationally, the possibility of directly replacing coke with hydrogen as a metallurgical reductant is of particular interest. A DC arc furnace fed with hydrogen has the potential to reduce or eliminate carbon emissions provided renewable resources are used for both electrical power and hydrogen production.
Key to the operation of DC arc furnace is the electric arc itself – a high-velocity, high-temperature jet of gas which has been heated until it splits into a mixture of ions and electrons (a plasma) and becomes electrically conductive. The plasma arc acts as the principal heating and stirring element inside the furnace, and understanding its behaviour is an important part of operating an arc furnace efficiently and productively. However, due to the extreme conditions under which arcs operate, studying them experimentally can be difficult, expensive, and hazardous. Coupled multiphysics models which simulate arcs from first principles of fluid flow, heat transfer and electromagnetics are therefore of great value in conducting in silico numerical experiments and building an understanding of how they behave under different process conditions. This presentation will discuss the development of an arc modelling workflow incorporating aspects of process thermochemistry, plasma property calculation from fundamental physics, and computational mechanics models of the arc itself. This workflow is then used to explore the impact of introducing hydrogen gas as an alternative reductant in metallurgical alloy smelting processes.
In keeping with the theme of this year’s CHPC National Conference, the critical role of HPC in plasma arc modelling will be discussed in terms of the data life cycle in plasma arc modelling – from input parameters through to raw simulation data, and finally to key insights which will help guide the next generation of clean metal production technologies.
In an age where AI is permeating every field of engineering and science, it is essential for researchers to quickly embrace technologies that can drive breakthroughs and accelerate innovation. Tools like MATLAB are specifically designed to lower the barriers to entry into the world of AI, making advanced capabilities more accessible to engineers and scientists.
Integrating truly AI-enabled systems into real-world applications presents significant challenges – data fragmentation, legacy system integration, and scaling advanced computations are common hurdles. This talk presents practical approaches to overcoming these obstacles, emphasizing how high-performance computing with MATLAB and Simulink can accelerate AI model development and deployment. After a quick introduction on how to access and use MATLAB at your university or institute, the session will focus on effective strategies and best practices for leveraging HPC capabilities such as parallelization and workflow optimization to achieve faster prototyping and scalable AI solutions.
The availability of MATLAB on the CHPC cluster through your university or institute license ensures that the presented workflows are accessible and reproducible for researchers across the Southern Africa region. The Academia Teams at MathWorks and Opti-Num Solutions (local partner company) are here to support your research by helping you quicky adopt AI and scale your work with parallel computing.
Q & A
The session will provide an opportunity for HPC Educators to share their plans, challenges, and ideas for HPC education curricula, tools, and resources with the broader HPC Educator community.
Due to the exponential growth of the internet, cyber fraud has become an increasingly prevalent issue globally, and South Africa is no exception. While several studies address cyber fraud victims, limited research has specifically examined students, particularly those from disadvantaged backgrounds receiving the National Student Financial Aid Scheme (NSFAS), as victims of cyber fraud. Financial aid is critical to enabling higher education for many South African students, and it is therefore crucial to understand the effects of cyber fraud on these students. This study has three key objectives: first, to understand how South African higher education financial aid students perceive cyber fraud; second, to identify the perceived events that led to cyber fraud; and finally, to understand the effects of cyber fraud on this group of students. The study used a qualitative research design with purposive and snowball sampling to select participants. Semi-structured interviews were used to elicit data from 30 participants. Thematic analysis was employed for data analysis. The findings underscore the urgent need for training and awareness programs tailored explicitly for financial aid students, particularly those receiving financial aid for the first time. Beyond the immediate financial losses, the study also highlights the adverse effects experienced by affected students. It underscores the crucial role of support systems in determining students' academic success.
In recent years, cybercrimes have become more prevalent and impactful for all users of modern technology. Consequently, various artificial-intelligence-driven intrusion detection software have been implemented to detect and prevent such cyberattacks. Some well-known tools include Microsoft's Security Copilot and SentinelOne's Singularity. However, such AI tools are of-ten difficult to train and maintain, primarily because of the lack of available cybersecurity datasets. Furthermore, even when real cybersecurity datasets are collected, they may lack balance, reliability, and variety, making them inefficient for training AI intrusion detection tools. A trending solution to this predicament is synthetic data generation, particularly for meeting commercial cyber-security dataset requirements. However, synthetic data generation simply mimics the structure and content of real datasets and often reproduces the poor characteristics of the real datasets. Therefore, this study proposes the inclusion of Data Quality Metrics and data optimisation techniques during the synthetic data generation process to improve the quality of synthetic cybersecurity datasets. At the pinnacle of this research, an optimal process for producing synthetic datasets for cybersecurity research is proposed.
In the modern digital era, cybersecurity has emerged as a critical domain, shaping the security landscape of organisations worldwide. As technological advancements redefine how businesses and individuals operate and interact, the need for robust cybersecurity measures becomes increasingly apparent. The purpose of this research study is to explain the effect of cybersecurity fatigue on employees' compliance with cybersecurity measures. Cybersecurity fatigue causes employees to become disengaged from cybersecurity activities, resulting in non-compliance. Empirical studies explaining the effect of cybersecurity fatigue on compliance with cybersecurity measures are relatively scarce. The study employed a qualitative case study approach. We purposely sampled 11 employees from a single organisation to collect data. The data was analysed using the NVivo software tool. The findings of this study indicate that cybersecurity fatigue leads to frustration and irritation, which results in negative perceptions of cybersecurity and non-compliant actions, such as ignoring cybersecurity requirements. Additionally, organisational culture and individual factors influence these effects. This research seeks to explain the effect of cybersecurity fatigue and encourage employees' compliance with cybersecurity measures.
Scalable Network Observability
SANReN Connect Proof of Concept (POC)
All CHPC Users, Principal Investigators and anyone interested in practical use of CHPC computational resources are invited to attend this informal Birds-of-a-Feather Session.
Short talks will be presented during the session followed by Q&A:
HPC Users and Usage (Lengau + New HPC Cluster) - Werner Jv Rensburg and Eric Mbele
Cloud Users and Usage (SEBOWA) - Dorah Thobye
Quantum Computing Access and Usage - Nyameko Lisa
NICIS 4-Year Business Plan Overview - Mervyn Christoffels
The session provides an excellent opportunity to meet up in-person with CHPC employees and to meet and engage with colleagues benefiting from CHPC services.
The FAIR principles have become the best practice for sharing artifacts, such as data, with the public. Findable, Accessible, Interoperable, and Reusable each seem straightforward, but as implementation details are worked through many difficult decisions and unclear meanings are revealed. This talk will look at common practices and challenges with implementing these principles when sharing digital artifacts.
Object storage technologies potentially provide an alternative
to filesystems for data storage and operations. In this talk I will
discuss the potential for object stores, how they differ from
filesystems and other common storage technologies, what performance they can provide, and how to adapt your programs to use them.
Title: Training ML algorithms on resource-constrained devices - a memory/storage perspective
Summray:
Deploying ML/AI algorithms on the edge is necessary for applications (e.g., security and surveillance,
industrial IoT, autonomous vehicles, healthcare use cases, ...) requiring low latency, data privacy or reduced costs. However, most edge devices are not equipped with powerful memory systems to perform such memory and processing intensive applications. The objective of this presentation is to show some optimization venues to unlock the memory/storage bottleneck of some ML/AI algorithms mainly from a learning perspective to deply them on low-resource devices. The optimizations presented in this talk could be also applied to whatever resource constrained device used for training, be it cheap virtual machines on cloud infrastructures, common personal computers or resource contrained micro datacenters.
Deploying ML/AI algorithms at the edge is essential for applications such as security and surveillance, industrial IoT, autonomous vehicles, and healthcare, which require low latency, data privacy, or reduced costs. However, most edge devices lack powerful memory systems capable of handling the memory- and computation-intensive nature of such applications.
The objective of this presentation is to highlight some optimization strategies that help overcome the memory and storage bottlenecks of ML/AI algorithms—mainly from a training perspective—to enable their deployment on low-resource devices. These optimizations can also be applied to any resource-constrained environment used for training, including low-cost virtual machines in cloud infrastructures, standard personal computers, or small-scale micro data centers.
Title: Scalable Data Management Techniques for AI workloads
Abstract: The advent of complex AI workflows that involve large learning models (training using data/pipeline/tensor parallelism, retrieval augmented generation, chaining) has prompted the need for scalable system-level building blocks that enable running them efficiently at large scale on high end machines. Of particular interest in this context are data management techniques and their implementation that bridge the gap between high-level required capabilities (fine-grain tensor access, support for transfer learning and versioning, streaming and transformation of training samples, transparent augmentation, vector databases, etc.) and the existing storage hierarchy (parallel file systems, node-local memories, etc.). This talk discusses the challenges and opportunities in the design and development of such techniques and presents several results based on VELOC and DataStates, two efforts at ANL aimed at leveraging checkpointing to capture the evolution of datasets (including AI models and their training data).
This paper presents a qualitative review on the integration of Zero-Knowledge Proofs (ZKPs) and biometrics in Decentralized Identity (DID) systems. It explores how these technologies address key challenges in digital identity management, including privacy preservation, security enhancement, and regulatory compliance. Guided by three research questions, the study systematically reviews recent literature to identify the problems these technologies solve, the sectors where they are applied, and the standards that govern their implementation. The review further reveals that ZKP-DID is the most widely adopted method, dominating fi-nance and governance applications, while Bio-DID focuses on healthcare and education under GDPR, and BioZK-DID combines biometrics with ZKPs for enhanced security but with limited regulatory guidance. The findings reveal that ZKPs enable privacy-preserving verification, while biometrics offer robust user-specific authentication. Their integration within DID systems is particularly relevant in sectors such as finance, healthcare, governance, and education. However, challenges remain in scalability, interoperability, and regulatory alignment. This paper contributes new insights by proposing technical guidelines, policy recommendations, and future research directions to support the ethical and effective deployment of ZKP-biometric-enabled DID systems.
In dynamic network environments, intrusion detection systems (IDS) must adapt to traffic network patterns despite the challenge of concept drift. Traditional drift detection methods, such as ADWIN, DDM, and others, face a challenge between sensitivity and stability, resulting in both delayed traffic attack detection and abnormal false alarms. To address this issue, we propose a novel framework - Adaptive-Delta ADWIN, which adjusts the ADWIN detector's delta parameter using two lightweight online controllers: Volatility Controller (VC) which adapts to fluctuations in prediction error, and Alert-rate Controller (ARC), which control the frequency of drift alarms. We merge the adaptive detector into streaming ensemble of Hoeffding Adaptive Trees and evaluate its performance against a fixed-delta baseline. The proposed metrics: accuracy, ROU-AUC, F1-score are monitored in real time performance. The results from the experiment demonstrate the effectiveness and responsiveness of the Adaptive-Delta ADWIN framework in handling concept drift while reducing false alarms and balancing sensitivity with stability in IDS streaming environments.
Databases are an important source of digital evidence, but most forensic methods and tools are focused on relational database systems. In-memory NoSQL databases, such as Redis are harder to investigate because persistence files and logs record only part of the activity, and volatile evidence exists in memory. This paper presents a technique and parser to bring multiple Redis sources: memory snapshots, RDB, AOF, MONITOR, ACL logs, and SLOWLOG together. Three experiments were carried out. The first tested recovery of short and long values from memory, showing that command arguments can be extracted from an offset even when not preserved in persistence. The second measured coverage across individual sources and demonstrates that combining them gives a broader view of the investigation. The third examine a master-replica scenario, where the parser recovers missing operations by matching memory with monitor logs. Our findings show that cross-source artifact correlation improve completeness in Redis forensic analysis.
Building Trust for Research: Federations, and the Blueprint Architecture.
TENET Update
The State of SA NREN in 2025
Cocktails Poster Session
An AI agent is a computational entity that can interact with the world and adapt its actions based on learnings from these interactions. I discuss the potential for such agents to serve as next-generation scientific assistants, for example by acting as cognitive partners and laboratory assistants. In the former case, agents, with their machine learning and data-processing capabilities, complement the cognitive processes of human scientists by offering real-time data analysis, hypothesis generation, and experimental design suggestions; in the latter, they engage directly with the scientific environment on the scientist's behalf, for example by performing experiments in bio-labs or running simulations on supercomputers. I invite participants to envision a future in which human scientists and agents collaborate seamlessly, fostering an era of accelerated scientific discoveries, new horizons of understanding, and--we may hope--broader access to the benefits of science.
Since 2016, the South African Medical Research Council (SAMRC) and the Department of Science and Innovation (DSI) have partnered to advance genomic and precision medicine research and innovation in South Africa. South Africa is embarking on a pioneering journey to transform its healthcare landscape through the South African 110K Human Genome Program. This ambitious initiative aims to leverage the existing talent pool and infrastructure to sequence 110,000 human genomes, creating a comprehensive database that will revolutionize personalized medicine in the country.
Understanding the genetic diversity of the South African population, Scientists and healthcare professionals can develop targeted treatments and interventions, leading to improved health outcomes and a more equitable healthcare system. This presentation will delve into the program's approach, highlighting the commencement of a pilot phase that aims to sequence 10,000 genomes from existing cohorts. Drawing from benchmarked large-scale genome programs globally. We will discuss the program's alignment with the National Decadal Plan, its potential to address health disparities, and its critical role in establishing robust big data for health environment. Furthermore, we will explore the program's contribution to economic growth, social development, and the creation of a capable state equipped to manage and support advanced scientific research.
Join us to discover how the South African 110K Human Genome Program is paving the way for a future where personalized medicine could be a reality for all South Africans.
DIRISA: Update on Data Infrastructure for Research Data Management and Collaborations;
Dr More Manda.
The transition from raw research data to impactful national decisions relies fundamentally on robust, accessible, and strategically managed data and data infrastructure. This presentation provides a high-level overview of the foundations of open science and frames the urgency within the unique South African context. It addresses critical systemic challenges, including data fragmentation and the complex dynamics of data ownership and governance that shape the research landscape.
The core focus of the discussion is DIRISA's strategic mandate as the key national enabler. The presentation illustrates how DIRISA provides national data platforms, research data management support, and services that govern the full data lifecycle—from ingestion to long-term preservation and sharing.
The presentation aims to demonstrate the national value of data stewardship: how it effectively bridges the gap between theoretical frameworks and practical, evidence-based decision-making for national benefit. The presentation concludes by exploring emerging trends and future requirements necessary to fully realize data’s potential for South Africa's sustainable development.
The Altron AI Factory provides South Africa’s academic community with secure, locally hosted access to enterprise-grade AI infrastructure and services. Built in partnership with NVIDIA and hosted in Teraco’s AI-ready data centre, it offers GPU-as-a-Service and AI-as-a-Service to accelerate research without heavy infrastructure costs.
This presentation highlights how universities and research institutions can leverage the AI Factory to advance data-driven studies, maintain data sovereignty, and collaborate across disciplines. Through high-performance computing, curated models, and managed services, the Altron AI Factory bridges the gap between academic research and industrial innovation to partner with South African academia to lead in the AI era.
CHPC Conference 2025, Cape Town
The transfer of large-scale scientific datasets between South African research facilities represents a critical bottleneck in computational research workflows. Climate modeling datasets of the Global Change Instituite, Wits University , as odf Aug2025, are just over 540TB over 3 users, particularly from the Conformal-Cubic Atmospheric Model (CCAM).Optimized transfer strategies between the Centre for High Performance Computing (CHPC) and the Data Intensive Research Initiative of South Africa (DIRISA) storage systems are thus necessary for resilient data flows between HPC, storage and local analsysis compute facilities.Current data transfer tools such as Globus Connect identify bottlenecks within data flow circuits, however manual command line iRODS interfaces present significant challenges for reliable data transfer through AI-assisted optimization."
This work presents a systematic application of artificial intelligence tools (Claude Code) to develop filesystem-aware transfer optimization solutions. The AI-assisted development process generated three complementary tools in under 4 hours of development time:
The core innovation lies in automated Lustre striping analysis using lfs getstripe commands, coupled with dynamic parameter optimization. The system automatically detects:
- Stripe counts and sizes for optimal thread allocation
- Object Storage Target (OST) distributions for concurrency planning
- File size patterns for buffer optimization
- Directory structures for efficient batch processing
Test Dataset: 189TB CCAM climate modeling installation (ccam_install_20240215)
- Source: CHPC Lustre filesystem (/home/jpadavatan/lustre/)
- Destination: DIRISA iRODS storage (/dirisa.ac.za/home/jonathan.padavatan@wits.ac.za/)
Validation Testing (67MB, 590 files):
- Success Rate: 100.0% (590/590 files transferred successfully)
- Transfer Performance: 0.95 GB/hour sustained throughput
- Reliability: Zero failed transfers with comprehensive verification
- Peak Performance: 10.41 MB/s maximum transfer rate
- Optimization: Automatic 8-thread, 64MB buffer configuration
Scalability Analysis:
- Small datasets (41-67MB): 100% success rate, 4-11 MB/s
- Medium datasets (17GB): Structure-preserving transfers completed
- Large datasets (20-34TB per directory): Systematic optimization applied
The AI-assisted approach delivered significant advantages:
- Development Speed: Complete toolchain developed in <4 hours vs. estimated weeks for traditional development
- Code Quality: Production-ready tools with comprehensive error handling and logging
- Documentation: Auto-generated usage examples and architectural documentation
- Iterative Improvement: Real-time debugging and enhancement based on performance feedback
Immediate Benefits:
- Enables systematic transfer of 189TB CCAM climate datasets to long-term DIRISA storage
- Provides reusable toolchain for other large-scale data transfers in SA research community
- Demonstrates AI-assisted development methodology for research computing infrastructure
Broader Applications:
- Astronomical data transfers (MeerKAT, SKA precursor datasets)
- Genomics datasets from National Health Laboratory Service
- Earth observation data from SANSA and international collaborations
- General HPC-to-archive workflows across CHPC user community
https://github.com/padavatan/chpc-irods-transfer-toolsPlanned extensions include:
- Integration with CHPC job scheduling systems for automated large-scale transfers
- AI-assisted optimization for emerging storage technologies
- Performance modeling for petabyte-scale climate datasets
Initially, the twin challenges of troubleshooting the Globus bottelneck and creating manaul workflow setup of irods scripts presented as techncical complexities requiring considerable investment in troubleshooting effort. This work demonstrates that AI-assisted development can dramatically accelerate research infrastructure optimization while maintaining production-grade reliability. The 100% success rate achieved in validation testing, combined with comprehensive filesystem-aware optimization, provides a foundation for systematic large-scale data management in South African research computing.
The methodology offers promise to the local HPC research community and other institutions facing similar data transfer challenges and represents a paradigm shift toward AI-augmented research infrastructure development.
AI Development Platform:
- Claude Code (claude.ai/code): Primary AI development assistant for code generation, debugging, and optimization
- Development Time: <4 hours vs. estimated 2-3 weeks traditional approach (10-15x productivity improvement)
Core Technologies:
- Languages: Python 3.9.6, Bash scripting
- HPC Infrastructure: CHPC 24-core DTN systems, Lustre parallel filesystem, DIRISA iRODS storage
- Specialized Tools: iRODS iCommands (iput, ils), Lustre client tools (lfs getstripe), GNU Parallel
- Version Control: Git, GitHub CLI, collaborative development workflow
Performance Analysis Framework:
- Custom benchmarking: Transfer rate analysis with variance tracking
- Comprehensive auditing: Source/destination validation, file integrity verification
- Real-time monitoring: Speed measurements, efficiency metrics, optimization recommendations
AI-Human Collaboration Model:
- AI Contributions: 2,597+ lines of production code, comprehensive error handling, filesystem-aware algorithms, automated documentation
- Human Contributions: Domain expertise, performance validation, requirements specification, production integration
- Result: Production-grade reliability with rapid development cycles
This technical stack demonstrates the practical implementation of AI-assisted research infrastructure development, providing a replicable methodology for other HPC environments.
Keywords: Artificial Intelligence, High-Performance Computing, Data Transfer Optimization, Lustre Filesystem, iRODS, South African Research Infrastructure, Climate Modeling, CHPC, DIRISA
Authors: Jonathan Padavatan¹, Mthetho Sovara², Claude (AI Assistant)³
¹ University of the Witwatersrand, Global Change Institute
² CHPC
³ Anthropic AI
Contact: jonathan.padavatan@wits.ac.za
Repository: https://github.com/padavatan/chpc-irods-transfer-tools
Modern mineral exploration increasingly depends on the ability to process and integrate large, multi-source geoscientific datasets. At Integrated Geoscience Solutions (IGS), we use High-Performance Computing (HPC) infrastructure provided by the Centre for High Performance Computing (CHPC) to advance regional-scale geophysical modelling and predictive mineral prospectivity mapping across Southern Africa.
Access to significant compute resources is essential to manage terabyte-scale datasets from magnetotelluric (MT), gravity, magnetic, and hyperspectral surveys that demand intensive 3D inversion, data fusion, and machine learning routines.
By leveraging CHPC’s multi-core architecture, parallelised inversion codes, and high-speed storage systems, we have reduced complex inversion runtimes from several days on standard workstations to under ten hours. These computational gains have directly enhanced exploration targeting, improved model resolution, and reduced project risk and cost.
The talk will discuss both the challenges (scalability, data I/O, and software optimisation) and successes (workflow automation, reproducibility, and improved accuracy) of running large geophysical models on CHPC clusters. Finally, it will highlight the broader scientific and economic impact of HPC-enabled exploration, from accelerating discovery to supporting Africa’s transition to a low-carbon, resource-resilient economy.
High-performance computing is critical for understanding the ocean’s role in Earth’s climate system. The Southern Ocean, in particular, plays a vital role in regulating global climate through the transport and exchange of heat and carbon, yet it is one of the most computationally demanding regions to model.
The Southern Ocean Carbon-Climate Observatory (SOCCO), CSIR, has developed a modelling framework that leverages NICIS-Centre for High Performance Computing (CHPC) resources to simulate the Southern Ocean at high spatial and temporal resolution. Through a hierarchy of coupled ocean-ice-biogeochemical model configurations, we investigate physical ocean and carbon-cycle processes driving air–sea CO₂ exchange, storm-driven variability, and ecosystem responses in a changing climate.
The flagship configuration, BIOPERIANT12, is a high resolution model domain that spans the Southern Hemisphere south of 30°S. It represents a complex and scalable computational challenge requiring significant HPC resources for multi-year simulations. Beyond computational cost of the model experiments, key challenges include the efficient analysis and management of the multi-terabyte outputs and the subsequent derived analysis datasets.
SOCCO demonstrates how use of national HPC infrastructure enables cutting-edge Southern Ocean and climate research, while providing transferable insights for earth system modelling and supporting the training of future earth system modellers and researchers.
The spinel LiMn2O4 cathode material has attractive candidates for the design and engineering of cost-effective and thermally sustainable lithium-ion batteries for optimal utilisation in electric vehicles and smart grid technologies. Despite its electrochemical qualities, its commercialization is delayed by the widely reported capacity loss during battery operation. The capacity attenuation is linked to structural degradation caused by Jahn-Teller active and disproportionation of Mn3+ ions. In several studies, the structural stability of spinel LiMn2O4 was improved by single- or dual-doping the Mn sites to curtail the number of Mn3+ ions. However, this results in loss of active ions, which ultimately limits the amount of energy that can be obtained from the battery. Herein, a high-entropy (HE) doping strategy is used to enhance the structural stability and electrochemical performance of LiMn2O4 spinel. The unique interactions of various dopants in HE doping yield enhanced structural stability and redox coupling, which can improve the concentration of the active material in the system. An HE-doped LiMn2O4 (LiMn1.92Mg0.02Cr0.02Al0.02Co0.02Ni0.02O4) spinel structure was successfully optimized using the Vienna Ab initio Simulation Package (VASP) code. The lattice parameters of the optimized (ground state) structure were determined to be 8.270 Å, which is less than the value of 8.274 Å of the pristine LiMn2O4 spinel structure. The yielded lattice contractions suggest a stronger M-O bond beneficial for increased resistance to phase changes and degradation. Moreover, the concentration of Mn3+ was decreased by 5.3% to defer the onset of the Jahn-Teller distortion and enhance capacity retention. This retention is part of some significant benefits emanating from dopants such as Cr3+ as it can participate in storing electric charge during the charging process by forming Cr4+ thus compensating the capacity loss endured during Mn3+ concentration reduction. Consequently, this work paves a path for exploration of several other fundamental properties linked to the electrochemical performance of spinel.
M Ramoshaba, T E Mosuang
Department of Physics, University of Limpopo, Private Bag x1106, Sovenga, 0727, South Africa
E-mail: moshibudi.ramoshaba@ul.ac.za
Thermoelectric chalcogenide materials exhibit promising properties, making them suitable for energy
conversion and cooling applications. Thermoelectric (TE) materials have attracted significant interest
due to their potential for energy harvesting and conservation. For a material to be considered an
efficient thermoelectric material, it must possess low thermal conductivity, high electrical
conductivity, a high Seebeck coefficient, and a high power factor. These characteristics contribute to
strong thermoelectric performance, leading to a favorable figure of merit (ZT). Although several
promising bulk semiconductors have been reported by researchers, no satisfactorily high ZT values
have yet been achieved. Chalcogenide semiconductors may provide a solution to this challenge. Using
density functional theory (DFT) and Boltzmann transport theory, the thermoelectric properties of
selected chalcogenide materials (Cu₂S, Cu₂Se, InS, and InSe) were analyzed. These studies revealed
strong thermoelectric performance, as the predicted maximum ZT values indicated high efficiency in
these materials.
Q & A
Accelerating your innovation with the Dell AI Factory
Lenovo Presentation
Around the world, the Research Software Engineering (RSE) movement has shown how professionalising research software practices and building RSE communities can strengthen the sustainability of HPC-enabled research. Many HPC users are writing their own code, often without formal training or long-term support, which raises challenges for efficiency, portability, reproducibility, and maintenance, all of which are foundational to sustainable research software. This workshop, the first of its kind in Africa, creates a space to showcase local software projects, share sustainability challenges, opportunities and practices, and strengthen our collective capacity for impactful computational research. Similar events have been held at computational conferences, such as ISC 2023 (where MS was an invited speaker) and SC, for the last five years.
Participants deliver 3–4 minute lightning talks about their research software projects, following a structured template, with a focus on software sustainability in an HPC context.
Template prompts (HPC-focused):
We will provide a slide template in advance with these fields for participants to populate with their content. The slide template is attached to this submission for reference.
Facilitated conversation drawing out common themes:
The rapid growth of Africa’s data-intensive research, artificial intelligence (AI), and high-performance computing (HPC) workloads is driving unprecedented demand for resilient and sustainable data infrastructure. Data centres are emerging as critical enablers of scientific discovery, cloud adoption, and digital innovation, yet the region continues to face significant barriers: limited local hosting capacity, reliance on international facilities, high latency, data sovereignty concerns, and a shortage of investment-ready projects.
The Digital Investment Facility (DIF)—a Team Europe initiative co-funded by the European Commission, Germany, and Finland, and implemented jointly by GIZ and HAUS—addresses these gaps by boosting investment in green and secure digital infrastructure, with a focus on data centres and Internet Exchange Points (IXPs). Operating as a project preparation and advisory facility, DIF supports projects from early design to contract closing, enhancing bankability through technical and financial advisory services, pre-feasibility studies, ESG integration, and investor matchmaking.
Crucially, DIF embeds a climate nexus at the core of its work. By promoting energy-efficient, renewable-powered data centres and aligning with ISO 50001 energy management standards, DIF ensures digital infrastructure projects contribute directly to climate action and the implementation of Nationally Determined Contributions (NDCs). Greener data centres reduce emissions from digital growth, enhance resilience through disaster recovery capacity, and enable the digital tools required for climate adaptation (e.g., climate modelling, earth observation, and early warning systems).
At CHPC, DIF will showcase how its approach enables data centres to meet the demanding requirements of HPC and advanced research—providing low-latency access, high-availability colocation, and sustainable cloud platforms that can host scientific datasets and AI workloads. The presentation will highlight the emerging pipeline of African digital infrastructure projects, the application of international standards, and the opportunities for researchers, policymakers, and investors to collaborate in building a digitally sovereign and climate-aligned HPC ecosystem in Africa.
The mechanical properties of materials change when subjected to dynamically conditions of high pressure and temperature. Such materials are those applied in cutting and shaping resulting in twisting and tensile forces. Results of selected MAX phases are presented to show variations in elastic constants as a function of dynamically pressure and temperature. Another situation where materials are subjected to such conditions is in the core of the earth. Stishovite, CaCl2 and Seifertite phases of silica, occurring in the core of the earth, are investigated with outcomes of phases transitions and related changes in seismic velocities that are compared with experimentally determined values.
The presentation showcases recent developments and applications of the ChemShell software in the field of energy materials by the Materials Chemistry HPC Consortium (UK), focusing on defect properties. This work capitalizes on the software engineering and methodological advances in recent years (including the UK Excalibur PAX project highlighted in the last year CHPC conference) by the groups of Prof. Thomas W. Keal in STFC Daresbury Laboratory (UK) and Prof. C. Richard A. Catlow at UCL and Cardiff University with several collaborators. Materials of interest include wide gap semiconductors used in electronic and optoelectronic devices as well as catalysis and solid electrolytes. The method allows one to explore both defect thermodynamics and their spectroscopic properties. Further examples show how a classical rock-salt structured insulator MgO can be usefully employed as a platform in studies of exotic states of matter, which are of fundamental interest, in particular, the unconventional cuprate superconductors with high critical temperatures, and the recently discovered phenomena in isostructural nickelate systems.
The cheetah is a pinnacle of adaptation in the context of the natural world. It is the fastest land mammal and has multiple morphological specialisations for prey-tracking during high-speed manoeuvres, such as vestibular adaptations to facilitate gaze and head stabilisation [1]. Understanding the cheetah’s head stabilisation techniques is useful in field such as biomechanics, conservation, and artificial and robotic systems; however, the dynamics of wild and endangered animals are difficult to study from a distance. This challenge necessitated a non-invasive Computer Vision (CV) technique to collect and analyse 3D points of interest. We collected a new data set to emulate a perturbed platform and isolate head stabilisation. Using MATLAB®, we built upon a method pioneered by AcinoSet [2] to build a 3D reconstruction through CV and a dynamic model-informed optimisation, which was used to quantitatively analyse the cheetah’s head stabilisation. Using our new dataset, and by leveraging optimal control methods, this work identifiesand quantifies passive head stabilisation, in conjunction with AcinoSet data, to quantify the active stabilisation during locomotion. Since this work includes computationally heavy methods, the processing of these data using optimisations and computer vision rendering can be benchmarked and compared to parallel computing methods, to further support the viability of the 3D reconstruction methods for other animal or human models and applications of high-performance and low-cost markerless motion capture.
[1] Grohé, C et al, Sci Rep, 8:2301, 2018.
[2] Joska, D et al, ICRA, 13901-13908, 2021.
High-performance computing (HPC) provides the means to translate complex multiphase flow data into insight that can inform industrial decision-making. This research applies advanced computational fluid dynamics, executed on the CHPC Lengau cluster, to model reacting gas–liquid systems relevant to oxygen lancing in pyrometallurgical tap-holes. The approach couples open-source CFD solvers with thermochemical data to capture flow behaviour, heat transfer, and reaction-driven gas evolution in molten metal–slag systems. Ferrochrome smelting serves as a representative case study, enabling validation against plant data and illustrating the broader relevance of the modelling framework to other high-temperature processes. By integrating computational models, large-scale data handling, and parallel analysis workflows, the study demonstrates how national cyber infrastructure can transform high-fidelity simulations into actionable understanding for safer, more efficient, metallurgical operations.
Q & A
This session is an opportunity for members of the HPC Ecosystems Community and those that identify as associates / affiliates to convene in-person. The session will allow time for members to discuss matters relating to HPC Ecosystems Project as well as broader African HPC and emerging HPC community topics. The 90-minute session will include 60-minutes of prepared talks from members of the community, followed by a further 30-minutes of open time for discussion and meaningful community engagement. Alas, muffins are not guaranteed.
Hybrid Stacking and Embedded Regression with Multi-Phase Feature Selection for Explainable Crop Yield Prediction in Botswana
Abstract
In Sub-Saharan Africa's climate instability, inaccurate data, and lack of precision agricultural tools make it extremely difficult to predict crop yields with any degree of accuracy. These restrictions are especially critical in Botswana, where most agricultural activities are rain-fed and highly vulnerable to environmental changes. To provide accurate, comprehensible, and context-specific yield predictions for four staple crops: Maize, Millet, Pulses, and Sorghum. This study uses a hybrid machine learning approach. The approach integrates multiple regression algorithms: Random Forest, XGBoost, Support Vector Regression, and Multi-Layer Perceptron within a stacked ensemble architecture tailored to Botswana’s agricultural data context. To optimize predictive power and interpretability, a multi-phase feature selection strategy was applied, combining entropy filtering, mutual information, recursive feature elimination (RFE), and engineered temporal features through lag variables.
This process refined input variables for both the staging models and region-specific selection, ensuring robust model generalization. Model performance was evaluated using historical yield, meteorological, and soil datasets, with R², RMSE, and MAE employed as metrics. The Stacking Hybrid Regression Model performed exceptionally well in yield prediction for pulses and sorghum, achieving the best performance with R2 = 0.94, RMSE = 0.60 t/ha, and MAE = 0.32 t/ha. The most significant predictors were rainfall, temperature fluctuation, and lagged yield values, according to a unified interpretability framework that was produced by combining SHapley Additive exPlanations (SHAP) with entropy analysis. Surprisingly, entropy research showed that Sorghum had a greater predictor complexity and shown the ability to adjust to unpredictable weather. Time-horizon stability of the model was confirmed by forward simulations for 2025–2028.
These results confirm that interpretable hybrid ensembles can satisfy precision agriculture's accuracy and transparency requirements when reinforced by multi-phase feature selection. The suggested approach supports climate risk management tactics for Botswana's farmers by providing useful information for early-season production projection and input distribution. Additionally, other sub-Saharan regions with comparable environmental and data-related constraints may find the methodology applicable.
Keywords: predictive crop yield, precision agriculture, Botswana, XAI, multi-phase feature selection, hybrid ensemble models, and SHAP.
The Sepitori language (also known as Pitori or Pretoria Sotho) is a dynamic and evolving creole language predominantly spoken in urban townships of Pretoria, South Africa. It blends Setswana, Sesotho, Afrikaans, and English, with frequent instances of code-switching and slang. Despite its widespread usage, Sepitori remains underrepresented in natural language processing (NLP) tasks, particularly in language identification and text processing.
This paper proposes the development of a Sepitori Language Identification (ID) Model, designed to classify and distinguish Sepitori text from other South African languages. The model addresses the unique challenges of multi-language mixing, informal vocabulary, and varying dialects within the Sepitori speech community. By leveraging machine learning techniques and deep learning models, including convolutional neural networks (CNN) and transformer-based models (e.g., BERT), the model utilizes a large-scale corpus of annotated Sepitori, Setswana, Sesotho, Afrikaans, and English samples. The model incorporates multiple linguistic features, such as n-grams, word embeddings, and syntactic patterns, to accurately identify Sepitori text, even when it involves heavy code-switching or slang.
This work contributes to the linguistic field by providing a novel computational tool for processing Sepitori, enabling the automatic detection of Sepitori in a variety of contexts, including social media, web scraping, and corpus development. It also lays the foundation for improving language resources for underserved African languages, with potential applications in speech recognition, machine translation, and sentiment analysis. The model is expected to improve the accessibility and representation of Sepitori in digital and computational platforms, fostering greater inclusivity for African language speakers in the digital age.
Computing processes typically require input data to perform actions
that generate output data. While input data can sometimes be generated
computationally, it often originates from external sources. In Natural
Language Processing and Digital Humanities, this input is typically
sourced from human activities, including spoken or written language
and music.
In the current era of Large Language Models (LLMs) that provide
practically usable tools, access to appropriate training data is
essential. These models generally perform better when larger data
collections are available for training, making data accessibility
crucial.
For most South African languages, only limited amounts of digitally
accessible data are available. Many existing data collections are
sourced from government websites, providing texts in highly specific
genres. Texts from diverse genres --- including newspaper articles,
literary works, and social media data --- are not openly accessible in
digital formats.
The SADiLaR (South African Centre for Digital Language Resources)
repository hosts data collections that are as openly accessible as
possible. The repository currently contains 357 directly downloadable
items and 56 metadata-only items (indicating the existence of data
collections).
The underlying principle of SADiLaR's repository is that providing a
centralized space for data collections makes them more easily findable
and accessible. (Additionally, submitted data collections are
requested to be as interoperable and reusable as possible to ensure
adherence to FAIR principles.)
This abstract serves as a call for action with two main objectives.
First, we encourage researchers to submit their digital language data
to the SADiLaR repository. Contributing to the repository increases
the availability of South African language data, making it more easily
findable and accessible. This data can then be used for training models,
ultimately benefiting language users, for example, through the
development of LLMs for these languages. (Note: copyright remains with
the original copyright owner; contributing to the repository does not
transfer copyright ownership.)
Each contribution to SADiLaR's repository receives a persistent
identifier, enabling consistent referencing of data collections. These
identifiers can be used as citations in publications, ultimately
benefiting researchers associated with the data collections.
Second, we encourage researchers to utilize the data collections
available in SADiLaR's repository. The repository contains a wealth of
useful data collections, and searching there first can streamline
research processes. SADiLaR's repository exists to facilitate work in
Natural Language Processing and Digital Humanities; collectively, we
can leverage this cyber-infrastructure to advance our fields.
South Africa’s rich linguistic diversity poses unique challenges for artificial intelligence systems, particularly in automatic speech recognition (ASR) where multilingual speakers frequently switch languages mid-conversation. This study proposes a robust ASR pipeline tailored for code-switched speech in health settings, addressing practical issues such as overlapping dialogue, background noise, and inconsistent language usage. The pipeline will integrate multilingual acoustic models and language-specific preprocessing techniques, trained on a standardised dataset comprising South African languages including isiZulu, Sepedi and English.
By focusing on pipeline design, dataset standardisation and multilingual integration, this work demonstrates how AI can be built to truly understand South African voices rather than ignoring them. Structured and reproducible approaches to code-switched data lay the foundation for inclusive, fair, and context-aware AI that represents local language communities and highlight the broader opportunities for leveraging multilingual data responsibly.
Sexually transmitted infections (STIs) remain a significant public health challenge in Sub-Saharan Africa (SSA), particularly among key populations such as men who have sex with men (MSM) and transgender individuals. This study aimed to assess the level of STI literacy within this population, identify its demographic, behavioral, and structural predictors, and explore its influence on knowledge, attitudes, behaviors, and healthcare-seeking. A retrospective observational mixed-methods approach was employed, combining logistic regression, structural equation modeling (SEM), and explainable machine learning (SHAP) to analyze data collected from 1,240 MSM and transgender individuals in Soweto, South Africa. The main outcome variable, STI literacy, was operationalized both as a composite score (binary: high and low) and as a categorical label (1 and 0), enabling both inferential and predictive modeling. Results revealed that 28.1% of participants demonstrated adequate STI literacy. Key positive predictors included younger age, prior STI testing, higher education, being single or married, female gender identity, and personal STI history. In contrast, older age, unemployment, lower education, substance use, and frequent sexual activity were associated with lower literacy. Structural equation modeling illuminated how STI testing experience acts as a cue to action, while stigma, cost, and fear serve as barriers. SHAP analysis confirmed these insights, highlighting modifiable predictors such as information-seeking, communication confidence, and testing accessibility. The study's findings were interpreted through Nutbeam’s Health Literacy Framework, the Health Belief Model (HBM), and the Theory of Planned Behavior (TPB). These frameworks helped contextualize the behavioral pathways linking sociodemographic factors to STI literacy and preventive actions. Notably, TPB constructs such as subjective norms and perceived behavioral control were particularly influential. This study contributes to the STI prevention literature by quantifying literacy gaps, modeling predictive pathways, and demonstrating how behavioral theory and machine learning can inform targeted interventions. It recommends multi-level approaches that go beyond awareness to address stigma, build self-efficacy, and enhance access to sexual health services. These insights are vital for designing inclusive, theory-driven public health strategies in SSA.
Q & A
Title: “Women in High Performance Computing South Africa (WHPC-South Africa)”
Duration: 90 minutes
Type of session: Advancement through the professional staircase
Organiser(s): Name Affiliation Email Address
1. Khomotso Maenetja University of Limpopo khomotso.maenetja@ul.ac.za
2. Raesibe Ledwaba University of Limpopo raesibe.ledwaba@ul.ac.za
3. Beauty Shibiri University of Limpopo beauty.shibiri@ul.ac.za
4. Tebogo Morukuladi Univerisity of Limpopo tebzamorukuladi@gmail.com
Description:
The WHPC BOF session for 2025 will be a reflection session where women in HPC will be sharing on how their careers have advanced in the past 5 years also sharing how the platform has impacted on encouraging women to take leadership or management roles in their workspace. This will give feedback on underrepresentation of women in HPC especially in leadership.
As a result, we are glad to offer an invitation to both male and female conference attendees to continue where we left off with the last session at the 2024 annual conference. The major goal of bringing them together at the meeting was to develop a network of female HPC professionals in South Africa. The CHPC executive team gave major assistance to the workshop, which was sponsored and attended by both men and women.
Anticipated Goals
• Improve women's underrepresentation in HPC (Contribute in increasing the number of women and girls participation in HPC through training and networking)
• Share information and resources that foster growth for women in HPC (institutionally and across community)
• Raise our professional profiles
• Encourage young girls at school level to consider HPC as a career of choice
Size: 80
Target audience: Women and Men
Prerequisites: Registered CHPC conference attendees
Chair(s):
Ms NN Zavala and Ms MG Mphahlele
Outline of programme: — Single 90 min
1. Opening – Prof Khomotso Maenetja (3 min)
2. Presentations
a. Introduction of the guest speaker – Prof RR Maphanga
b. Guest Speaker – Keynote speaker – 30 Min
c. Dr Tebogo Morukuladi – Academic Journey (15 min)
d. Ms CS Mkhonto (University of Limpopo, Faculty of Science and Agriculture Student Ambassador) (15 min)
e. Ms Precious Makhubele and Keletso Monareng – Moving from Masters to PhD Journey (15 min)
3. Closure – Prof RS Ledwaba (2 min)
This presentation will focus on the collaborative, quantitative co-design approach to the deployment of large-scale computing services adopted by the STFC DiRAC HPC Facility in the UK (www.dirac.ac.uk). Over the past 15 years, successive generations of DiRAC services have demonstrated how workflow-centred co-design can maximise the scientific impact of computing investments. The co-design of DiRAC services has ranged from silicon-level to system-level, alongside extensive software development effort, and has delivered significantly increased system capabilities.
I will also discuss how federation can deliver additional research capabilities and optimise service exploitation, while lowering the bar for access to large-scale computing for new users.
Looking to the future, I will explore how co-design can be used to develop cost-effective and energy-efficient heterogeneous computing ecosystems for AI and simulation.
Focusing on Building and scaling low cost Large Language Models as a Service
There is a great need to develop computing infrastructure to support the increased application of data science and health informatics across Africa which includes robust data sharing and federated computing, whilst fostering research collaboration. The Global Alliance for Genomics and Health (GA4GH; https://www.ga4gh.org/) aims to promote responsible data sharing standards through the use of open, community derived standards and APIs such as Data Repository Service (DRS), Workflow Execution Service (WES), Data Connect, Passports and Tool Registry Service (TRS), amongst others. The DRS API provides a generic interface to access data in repositories. The data is discovered through the Data Connect API which supports federated search of different kinds of data. The WES API provides a standardized approach for accessing computing resources with use of reproducible workflows, usually housed in a tools registry service such as Dockstore (https://dockstore.org/). The eLwazi Open Data Science Platform (ODSP) has undertaken a pilot implementation of the GA4GH standards with the aim of delivering a federated framework for data discovery and analysis within Africa for the DS-I Africa consortium. The eLwazi GA4GH pilot project was started in June 2023 as an outcome of a training hackathon by the eLwazi ODSP Infrastructure work group in collaboration with the GA4GH. The main goal of the GA4GH pilot project is to enable the findable, accessible, interoperable and reusable (FAIR) principles for data discovery and analysis. Four sites within Africa (Ilifu - South Africa, ACE Lab - Mali, ACE Lab - Uganda and UVRI - Uganda) are currently hosting the different API endpoints for authorized data discovery and analysis. From within the project we can locate DRS datasets using the Data Connect API, use workflows from Dockstore via the TRS API for reproducible analysis, and submit it to the WES API for analysis without the data leaving the actual location, which provides a technical solution for data analysis within legislative data protection constraints. We are now in the process of developing a federated approach for the imputation of African genomics data as a GA4GH implementation forum (GIF) project collaboration based on the lessons from the pilot GA4GH implementation project.
The proliferation of Artificial Intelligence (AI), data-driven research, and digital transformation has increased the global demand for powerful computing infrastructures capable of processing and analyzing enormous volumes of data. High-Performance Computing (HPC) has emerged as the cornerstone of this evolution, enabling researchers to perform complex simulations, accelerate model training, and analyze Big Data at unprecedented scales. Yet, across many African universities, access to such advanced computing capabilities remains severely limited, constraining the ability of scientists to participate meaningfully in global AI and data science innovation. This paper explores the strategic integration of HPC technologies with deep learning architectures to establish a sustainable, Big Data-driven cyberinfrastructure model tailored for African academic environments.
Drawing inspiration from the ongoing efforts at the University of Mpumalanga (UMP) and the Council for Scientific and Industrial Research (CSIR), the study proposes a framework that connects HPC systems with scalable AI workflows in areas such as agriculture, climate modelling, energy, and cybersecurity. The framework emphasizes distributed GPU-accelerated clusters, containerized computing environments, and job scheduling mechanisms that allow multiple research teams to run parallel deep learning experiments efficiently. Beyond the technical dimension, the paper highlights the importance of local capacity development, collaboration, and institutional investment as key drivers for long-term sustainability. By showcasing how HPC can shorten AI model training times, enhance predictive accuracy, and improve data management efficiency, this research demonstrates that advanced computation is not merely a luxury for developed nations but an attainable enabler of scientific independence for African universities.
The findings underscore that the convergence of HPC and AI can transform research productivity, foster interdisciplinary collaboration, and support evidence-based policymaking in sectors critical to Africa’s development. Ultimately, the paper advocates for the creation of a federated HPC-AI ecosystem across African institutions, allowing shared access to
computational resources, open datasets, and research expertise. Such an ecosystem would democratize access to cutting-edge technologies, close the digital divide, and position African researchers as active contributors to the global knowledge economy rather than passive consumers. Through this integrative perspective, the paper not only offers a technical blueprint for HPC-AI synergy but also presents a vision for empowering scientific innovation, data sovereignty, and technological resilience within the African higher education landscape
The Conference Networking Session is designed to facilitate networking and informal interaction amongst delegates. Canapés will be served along with a selection of soft drinks, beers and wines.
This talk will discuss the fast evolving symbiosis between HPC and AI from both a market and scientific point of view, with a particular focus on how this manifests in the US National Science Foundation plans for the new Horizon system, part of the Leadership Class Computing Facility, and other developments around the world this year in HPC.
The Active Memory Architecture (AMA) is a non von Neumann, is a memory-centric, non von Neumann, graph processing architecture for scaling to Zettaflops performance with first production delivery in 2031.graph processing computer architecture for scaling to Zettaflops performance capability. Its first commercial production delivery is scheduled for 2031. The AMA project is in its second year of design at the Texas Advanced Computing Center. AMA is a product of more than three decades of exploratory research in parallel computing including parallel execution models, dynamic runtime systems, hardware architecture, and parallel programming. Strictly speaking, it is not conventional with respect to typical von Neumann processor cores. The memory and logic are merged with small chunks of data (about 1 K wide words) and logic in message-driven units called “Fontons”. The messages are “Operons” that carry both data and work to any Fonton in the global system. The name space combines attributes of both virtual and physical addressing across the system. The distribution of work is dynamic and changes during the computation for optimal operation. This closing short presentation of CHPC25 will update the international HPC community on this revolutionary method immediately following the presentation of Dr. Dan Stanzione, Director of TACC.





