THE CONFERENCE IS NOW LIVE!
The aim of the conference: to bring together our users so that their work can be communicated, to include world renowned experts, and to offer a rich programme for students, in the fields of high performance computing, big data, and high speed networking. The CHPC National Conference is co-organised by the CHPC, DIRISA and SANReN.
The CHPC 2022 Conference will be a hybrid event with a physical programme hosted at the CSIR International Conference Centre, along with full streaming to the virtual conference platform.
This theme of the conference is:
For more information please see the main conference site.
Department of Science and Innovation opening address by Deputy Director General Imraan Patel
National Integrated Cyber-Infrastructure System (NICIS) welcoming address by Centre Manager, Dr Happy Sithole
HPC, AI, and Analytics users ask more of their HPC-AI systems than ever before. High Performance Computing is the foundation of research and discovery. Artificial Intelligence is adding to it. Intel’s deep investments in developer ecosystems, tools, technology and open platforms are clearing the path forward to scale artificial intelligence and high performance computing everywhere. Intel has made HPC and AI more accessible and scalable for developers through the OneAPI development environment, and through extensive optimizations of popular libraries and frameworks on Intel® Xeon® Scalable processors. Let’s look at Intel HPC-AI strategy and new innovations including the latest Intel® Xeon® Scalable processors, the latest data center GPUs and powerful software tools. Together, let's maximize the possibilities for the users in HPC-AI.
The Data Centre Optimisation Sessions will be a combination of presentations and open discussions focused on the following:
High Performance Computing for Research has always been a capital heavy expenditure, which includes servers, construction of Large Data centers and also other components to support this infrastructure such as 24*7 power and cooling. Today's research involves running high performance server clusters in a grid to solve complicated problems, but the access to capital for researchers to build scalable higher performance computing grids is always a challenge. This topic largely focuses on how high performance computing clusters can be scaled and built using Opex based models allowing a much more user friendly charging model based on usage requirements. This allows sharing of infrastructure across multiple research organizations and also to leverage innovations in Data centers such as cooling, Quantum computing, etc.
In this talk I'll discuss how molecular dynamics and multiscale modelling can be used to accelerate material discovery and optimisation. I'll give an introduction to the techniques and HPC requirements and show two examples where molecular modelling has helped to optimize fillers composition for polymer composites and electrolytes for energy storage applications.
Next Generation Sequencing of DNA is now a common, almost routine, part of many labs in the molecular biology space. This inadvertently means a greater burden in terms of data storage and compute requirements. This is downside of more accessible NGS platforms. The upside however far outweighs the inherent challenges. With the changing landscape of how molecular biology and diagnostics is done, the need for reliable high-end compute resources and storage necessitated our need for the CHPC's infrastructure. The CHPC and their sterling support to inqaba biotec over the years has enabled us to improve our services and result delivery to our existing customers and, allowed us to realistically conceptualize even bigger projects, much of which will be geared toward Human health and food security issues.
The advancement of scientific knowledge is driven by the ability to reproduce research findings. While there is an agreement about the importance of reproducibility, the core challenge, however, is that reproducible artifacts are usually put together after the research has been completed coupled with the lack of standards and motivation to carry out the task after the research has been completed. There is a need to develop a new culture for scientists that fosters incorporating reproducibility from the beginning to the end of their research endeavor. The amount of effort required to put together reproducible artifacts for published results and the lack of incentives have prevented the scientific community from sharing reproducible results, therefore hindering trustworthiness of their results.
During this presentation we will summarize the key takeaways of the 2019 National Academies of Science Engineering and Medicine report on reproducibility. We will review the catalogue of existing provenance capture and replay tools, discuss the experience of existing reproducibility efforts and what we see as existing gaps. We will share the preliminary results of our project to develop a set of tools that aim to seamlessly capture reproducible artifacts and thus lower the barriers for capturing artifacts while doing research. We will also conclude by sharing our current efforts to build an ecosystem of tools and services to support reproducibility and a summary of recent discussions held at a BoF we organized at the Supercomputing conference-SC22.
According to studies in the UK and US, research software underpins ninety-five percent of research. Thirty-three percent of international research produces new code. However, research software still has to be recognised as a first-class research output, and the researchers who develop it often find themselves in dead-end career paths.
In 2012, research software engineers in the United Kingdom embarked on a mission to change the academic system to (1) recognise the value of research software; (2) recognise the role research software engineers play as part of research groups; and (3) develop a career path for research software engineers. Research software engineers (RSEs) typically spend most of their time on software development. They often have formal training in a specific research discipline and understand nuances of the field in which they work, which aids in developing appropriate software.
Notable organisations that make up the global RSE community include the United Kingdom’s Software Sustainability Institute, various country- and region-specific RSE Associations, the Society of Research Software Engineering (SocRSE) and the Research Software Association (ReSA). The Research Software and Systems Engineering Africa (RSSE Africa) community was established in 2019 and offers an online forum and regular community meetups for African RSSEs to connect and learn.
In 2020 ReSA published an initial mapping of research software initiatives (including projects, communities, and funders) for the Global North. The exercise was repeated in 2022 for the Global South. The second round of mapping increased awareness of research software initiatives on the African continent, but significant gaps still exist. Talarify is building on the work done by ReSA to increase the visibility of research software stakeholders in Africa.
In this presentation, the authors will introduce various concepts related to RSEs, and share resources such as software sustainability evaluation guidelines, training opportunities, and information about joining the African and global RSE communities. Members of African research software communities, teams, and projects, will be invited to add themselves to the global map.
The HPC community includes many nascent and established research software developers who may be unaware of existing and emerging opportunities and resources. The presentation will interest researchers who develop software, professional software developers in research environments, research and infrastructure managers, policymakers, funders, and more.
The Data Centre Optimisation Sessions will be a combination of presentations and open discussions focused on the following:
The existence of the CHPC where our group has access to through the Lengau cluster under the project ERTH0859 is providing us with the means to do our modeling activities requiring high-performance computing. These activities are the systematic and seamless dynamical and statistical downscaling tasks ranging from short-range weather forecasting; seasonal climate forecasts; to climate simulations and projections. Over the years, we have been using the Conformal-Cubic Atmospheric Model (CCAM) to generate high-resolution climate outputs or dynamically downscale global climate models across the mentioned time spectrum over Africa, with a closer look over Southern Africa. The climate projections are to inform climate change impact studies, risk and vulnerability analysis, the formulation of adaptation strategies and climate change policy while short-range weather forecasting and seasonal forecast inform timely decision making.
Currently, weather forecasts are being generated at 15km resolution over South Africa while seasonal forecasts and climate projections are generate at 8km resolution over South Africa and 50km over the whole of Africa. Other finer resolution data are being generated for specific places within the country e.g., the high-resolution downscaling of urban climate over Tshwane has been completed. These urban runs are high resolution runs that use the TEB within CCAM to simulate the urban form, in order to simulate the urban heat island in current and future climate. This is important to assist urban areas in understanding the spatial differences in exposure to heat, and impacts on air quality and health. This is a unique capability to the CSIR modelling team, and due to its computational expense, this task among other dynamical modelling tasks would not be possible without the CHPC. In addition, the improvement of the seasonal forecasting system has continuing with the use of the CHPC. We also have other limited area models we have started exploring e.g., the Weather and Forecasting non-hydrostatic model to complement the CCAM model-based runs. These model runs continue to highly support much of the research in the group.
The ever-increasing complexity of engineering problems and the demand for decisive insight are evident. As we push the boundaries of science for sustainable innovation we need to be able to continuously enhance our computational approaches to numerical simulation. The resource requirements for simulation-based design and optimization of complex components and systems are often considered prohibitive. However, with High-Performance Computing (HPC) resources it is now possible to simulate problems with complex physics with faster turnaround times and affordable rates.
Cauchy Consult’s Computer-Aided Engineering (CAE) department specializes in simulation-based design and optimization across a wide range of industries. The Centre for High-Performance Computing (CHPC) is integral to Cauchy’s CAE and its mission to make the impossible possible. Performing a transient CFD simulation of the continuous recycling (re-heating) of Heavy Fuel Oil (HFO) for a 12-hour period was made possible by using the CHPC. As appose to running the simulation for 210 days on our on-premises workstation it took 14 days to run on the CHPC using 120 cores. Solving the transient CFD simulation within the required time frame provided new insight and uncovered new possibilities for design and optimization.
This talk will highlight the ongoing efforts of the ICTP (International Centre for Theoretical Physics, Trieste, Italy) and partner
institutions in developing graduate level academic programmes on HPC and
The HPC field is growing far faster than we can adequately train the most qualified candidates. To better address these workforce needs, mentoring people that have potential, but do not believe they can achieve in or even belong in HPC can help both address the workforce needs as well as enhance and expand representation. This talk focuses on developing students and even early career people into successful, confident HPC community members.
*Correspondence: email@example.com; +27219380206
Cheminformatics has gained traction over recent years. Under the large umbrella term, there are a plethora of in silico techniques that can be used to gather information at the molecular level, with high importance in numerous research niches including biology, biochemistry and drug discovery. Our research group utilizes these techniques within the drug discovery domain. Our current research focuses on the molecular understanding of phytochemicals identified in South African traditional plant extracts. This includes the pharmacokinetic and toxicological profiling of compounds, biological target identification and network pharmacology analysis, as well as mechanistic analysis through molecular dynamic simulations. The CHPC has been vital in this pipeline. Hosting the AMBER suite, CHPC has created the environment to set up, equilibrate, run and analyze these biomolecular simulations. The GPU version offered through AMBER has also optimized and shortened the time required for trajectory generation. Our research group currently has 11 active members on the cluster that have used 587973 CPUh over the last 3 months. This has allowed our students to train and run their research without the need of expensive equipment. The above-mentioned biomolecular cheminformatics techniques, through CHPC, have and will continue to revolutionize drug lead optimization, personalized therapy, and infectious disease therapeutics within Africa.
Mesoscale convective systems (MCSs) are defined as cloud bands that produce flanking precipitation within a scale of 100 km or more and comprise an ensemble of thunderstorms. MCSs may also be associated with severe weather such as tornadoes, hail, straight-line winds, squall lines, mesoscale convective complexes and flash flooding. MCSs are a major contributor towards the total observed global rainfall and to the hydrological cycle at large. Numerical Weather Prediction (NWP) models play a critical role in skilfully and timeously predicting such events to help prevent or mitigate associated hazards. High resolution NWP models are equipped with the ability to explicitly represent some physical processes within convective systems by means of parameterization schemes. Convection parameterization schemes play a more critical role in representing the pattern, characteristics, processes and temporal variation of convective precipitation in NWP models. This study aims to investigate the capability of the Conformal Cubic Atmospheric Model (CCAM) in simulating cases of severe MSCs over South Africa. The CCAM comprises a convection parameterization scheme, CSIRO9, which has different versions. The CCAM was setup to run with four versions of the CSIRO9 on the CHPC. The results will show the CCAM performance in simulating these MCSs, the benefits and shortcomings of each version of the CCAM convection scheme and the effectiveness of the CHPC in running NWP models.
The Square Kilometer Array (SKA) global mega science project stands to have a continued significant positive impact on the South African science, research, technology and innovation space. A computing infrastructure perspective of the past, present and future of those impacts will be presented. As the South African Radio Astronomy Observatory pivots to become the host of the SKA1 Mid frequency radio telescope, we look at SARAO's roles and future directions.
From the sensor to the laptop, from the telescope to the supercomputer, from the microscope to the database, scientific discovery is part of a connected digital continuum that is dynamic and fast. In this new digital continuum, Artificial intelligence (AI) is providing tremendous breakthroughs, making data analysis and automated responses possible across the digital continuum. SAGE is a National Science Foundation project to build a national cyberinfrastructure for programable edge computing. This new edge computing programming framework gives scientists a new tool for exploring the impacts of global urbanization, natural disasters such as flooding and wildfires, and climate change on natural ecosystems and city infrastructure.
Addison Snell of Intersect360 Research will give an overview of HPC market trends and forecasts, including the “grand unification” of HPC, AI, cloud, hyperscale, and enterprise, and reactions from SC22.
The Analyst Crossfire session aims to engage with the Vendors/Sponsors of the conference and will be facilitated by the conveners, Mr Addison Snell and Mr Dan Olds, from Intersect360.
The Vendor representatives in the session are:
Mr Ahmed Al-Jeshi (Intel)
Mr Olivier Blondel (HPE)
Mr Ryan Rautenbach (Dell Technologies)
Mr Rick Koopman (Lenovo)
Mr Yossi Avni (Nvidia)
View the Student Poster display, the Student Cluster Competition, the Cybersecurity Challenge, The Datathon Competition.
Scientific workflows are now a common tool used by domain scientists in a number of disciplines. They are appealing because they enable users to think at high level of abstraction, composing complex applications from individual application components. Workflow management systems (WMSs), such as Pegasus (http://pegasus.isi.edu) automate the process of executing these workflows on modern cyberinfrastructure. They take these high-level, resource-independent descriptions and map them onto the available heterogeneous compute and storage resources: campus clusters, high-performance systems, high-throughput resources, clouds, and the edge. WMSs can select the appropriate resources based on their architecture, availability of key software, performance, reliability, availability of cycles, storage space, among others. With the help of compiler-inspired algorithms, they can determine what data to save during execution, and which are no longer needed. Similarly to compiler solutions, they can generate an executable workflow that is tailored to the target execution environment, taking into account reliability, scalability, and performance. This talk will describe the key concepts used in the Pegasus WMS to help automate the execution of workflows in distributed and heterogeneous environments. It will showcase applications that have benefited from Pegasus’ automation as well as touch upon new types of science applications and their needs. The talk will explore potential use of artificial intelligence and machine learning approaches to improve the level of automation in workflow management systems.
The CHPC Users BoF is focused on engagement with all CHPC Users where an overview will be presented on the usage of CHPC compute resources, discussion of topics of common interest, feedback from users to the CHPC and engagement of users with CHPC employees. Discussions will focus on the usage of the Lengau compute clusters (CPU and GPU) and the Sebowa cloud infrastructure.
All users are encouraged to participate in this session!
Presenters will include: Werner Janse van Rensburg, Dorah Thobye and Nkwe Monama
Globally there is movement in the trajectory of developing a Global Open Science Cloud (GOSC) aimed at supporting research collaborations across continents to assist in addressing global science challenges - for example UN Sustainable Development Goals (SDGs), climate change, infectious diseases, and coordination of global disaster risk reduction.
Continents, regions, and countries are also actively developing Open Science platforms and investing in underlying cyberinfrastructures to advance their Research Science Technology and Innovation (RSTI) ecosystems, enhance collaboration and increase their competitiveness and critically, use RSTI as a driver for national and continental priorities.
To this end, a discussion of the African Open Science Platform (AOSP) and progress in its development is given in this talk. AOSP aims to position African scientists at the cutting edge of data intensive science by stimulating interactivity and creating opportunity through the development of efficiencies of scale, building critical mass through shared capacities, amplifying impact through a commonality of purpose and voice, and to engage in Global Commons to address continental and global challenges through joint action
AOSP pilot study conducted an audit and provided frameworks to guide countries in the development of requisite policies, infrastructure, incentives, and human capital to facilitate leveraging of open science and open data amidst the digital revolution – with all the challenges and opportunities presented.
Furthermore, African regional blocks also have initiatives aligned with AOSP - for example, the Southern African Development Community Cyberinfrastructure Framework – (SADC CI) has been approved by Governments. It is currently supporting some regional projects and was consulted in the AOSP pilot project. The SADC CI facilitates a regional collaborative ecosystem for research, innovation, and teaching by creating a shared commons for data, computational platforms, and human capital development over a fabric of high-speed connectivity afforded by National Education and Research networks (NRENs)
AOSP provides avenues and a trajectory towards developing a Pan African cyberinfrastructure to support advancement of the continent's science enterprise through open science and open data. Furthermore, such cyberinfrastructure will promote collaboration and support addressing higher-level African priority areas and challenges through leveraging the medium of research, science, technology and innovation, and thereby contribute to African advancement and integration to help deliver on the African vision - Agenda 2063 - The Africa We Want
Keywords: Cyberinfrastructure, Open Science, Open Science Platforms, RSTI Policy and Ecosystems
The non–orthogonality of the United Nation’s Sustainable Development Goals (SDG), their spatio–temporal variations and complex interactions entail concerted interdisciplinary efforts to identify and address triggers of their indicators. As the SDG span across the entire spectrum of human existence, they are naturally associated with data deluge–hence the likelihood of operating around unseen, ignored or difficult knowledge gaps is high. For example, the complex interactions of the SDG imply that part of the solution to poverty (SDG #1), say, may rely on the health and wellness of the population (SDG #3), its level of education and innovation (SDGs #4 and #9), societal equality schemes in place (SDG #10) and other subtle factors within the SDG domain set. In many developing countries, shortage of resources in the veterinary–extension network leads to poor farmer outcomes in terms of animal health and productivity. The animal products value-chains tend to be fragmented and disconnected, leading to a number of steps to market and poor prices for farmers. Overall investment in animal farming that could lead to innovation and increased productivity is largely inhibited by the lack of reliable market and logistical information sharing infrastructure. We propose a prototype for extracting potentially valuable data for assessing the value, health and general condition of animals. The prototype is powered by onsite generated and remotely modelled structured and unstructured data using adaptive machine learning techniques–a hybrid of statistical pattern recognition methods and image analysis. Structured data is generated by a scale–invariant slider, repeatedly applied to animal images of interest, hence providing triangulating data for the images, which are camera–captured from specific distances and angles.
The novelty of the approach derives from the foregoing data generation mechanics and its adaptation of existing technologies in large scale animal farming to provide scalable solutions to small and medium scale animal farming practices. We discuss how its application can enhance productivity among small–scale animal farmers by empowering them to make more effective decisions and benefit from reduced risk and enhanced profitability. We show how pre–market estimation of animal weights and their potential market value, directly or indirectly, aligns with the SDG agenda. Finally, we demonstrate potential extensions into video monitoring and classification of animals.
Key Words: Animal Grading System, Association Rules, Big Data, Body Condition Score, Convolutional Neural Networks, Data Science, Data Visualisation, Interdisciplinarity, Predictive Modelling, Sustainable Development Goals
Database forensics, is a branch of digital forensic science that allows forensic examination of databases to be conducted by following normal digital investigation, processes. Regardless, while conducting investigation processes on databases, the aim is always directed towards the extraction of Potential Digital Evidence (PDE) in a forensically sound manner. More often than not, forensic investigations on databases targets data in a database, the metadata and this data may at different instances contain important facts that can be used to prove or disprove facts in a court of law during criminal or civil proceedings. Normally, the metadata forensically presents a technique that determines how the digital data that is extracted from the databases can be interpreted. The main problem that is being addressed in this paper is how a digital forensic investigation process can be applied to a compromised MySQL database. The research has achieved this by simulating attack scenarios, reconstructing MySQL database and conducting MySQL Forensic Investigation Process (MqFIP) and the findings have been extrapolated well.
Key words: Database forensics, Database management system, Forensic configuration, Database, MySQL, MySQL forensic investigation process
Question period occurs after presentations.
This session will cover the latest Intel Hardware (Xeon and GPU) in more technical details related to HPC use cases. It will help the audience to understand what products are coming shortly and what are the capabilities of those products.
This talk will provide an update on the way HPE is working with customers to provide technology and solutions to address the most challenging problems in high performance computing. It will include the need to make use of heterogeneous computing elements, and how AI can be combined with HPC to enable end users to be more productive.
The last session on Women in High Performance Computing in South Africa was proposed and held during 2019 annual conference. The major aim of the initiative was to establish a network of Women in HPC in SA, by bringing them together during the meeting. The session was supported and well attended by both men and women, and most importantly was supported by CHPC management team. International speakers gave talks and fruitful discussions were achieved.
Subsequently, we found it fit to have another session during CHPC 2022 annual meeting in order to maintain and sustain this initiative most importantly because too much time was lost due to Covid 19.
It is aimed that at the end of the session and beyond, the following goals would be achieved:
• Improvement of workplace diversity.
• Improve women's underrepresentation in HPC (Contribute in increasing the number of women and girls participation in HPC through training and networking.)
• Share information and resources that foster growth for women in HPC.
• Conduct desktop studies to understand the statistics and challenges
• Influence policy on resources allocation
The convergence among researchers on the value of data sharing continues to grow yet little data sharing occurs. Several studies exist to understand motivations, perceptions, practices and barriers to research data sharing with seemingly similar findings. One of the key barriers to data sharing is privacy concerns. Not all data can be made publicly accessible. Research data should be made ‘as open as possible, as closed as necessary’. In some instances, data commercial concerns continue to grow especially from industry funded research. Some researchers still perceive data as private thus should be closely kept. Concurrently, some researchers lack knowledge and skills to develop data management plans that contribute to their data sharing desires. As a result, data sharing clauses, processing procedures and privacy statements are missing in informed consent statements. This not only suffocates data sharing efforts but raises potential legal challenges. Though in some disciplines such as biomedicine, data sharing has become a standard research practice, data sharing is not yet common in some communities such as humanities. This paper is based on a narrative literature review of university data management policies, guidelines and publications. The review focused on privacy, confidentiality and security concerns in research data sharing. To address privacy concerns in data sharing, the paper suggests a combination of 1) data management plans (DMP) 2) obtaining informed consent 3) anonymization and 4) data access controls. Data sharing begins with a sound DMP. A DMP gives the researcher an opportunity to ‘walk through’ the research before it begins. The researcher defines required privacy, security, and sharing beforehand, therefore contributes to data sharing planning. Further, researchers can attain ethical and legal data sharing by obtaining consent, anonymizing data and providing clarity to data copyrights and access controls. The paper contributes to knowledge on data sharing by providing practical suggestions and tips to handling privacy concerns in data sharing. The paper highlights essential requirements that researchers must satisfy to ethically and legally share research data. However, a focus on personal data in a general context is the paper’s major limitation. Future research must focus on a specific discipline because data sharing concerns are contextual.
Keywords: data sharing, privacy, informed consent, data management plan, data access.
The devastating impact of climate change, based primarily on the overuse of fossil fuels, has increased awareness of and interest in renewable energy. This awareness is increased by recent problems in South Africa, among other nations. Renewable energy sources face the “energy trilemma" of energy supply, economics, and environmental concerns that must be addressed through an optimal configuration of a hybrid renewable energy system (HRES) to meet these concerns. In this talk, we will first outline how the use of renewable energy resources may mitigate climate change. Then, we will discuss how bio-inspired algorithms may play a role in smart energy management of both these renewable energy resources (the production side) and smart energy consumption (the load side) through dynamic optimization of load shifting and through optimal utilization of available renewable energy resources respectively. Load-shifting is accomplished, via a bio-inspired algorithm, to shift peak load demand to off-peak hours using agent negotiation of appliance operational times and thus, reduce the peak load demand. In terms of production, an optimal configuration of renewable energy components is needed in terms of optimal cost and sizing to meet this load demand via a “check-reduce-improve” framework using the newly-developed bio-inspired algorithm of Spider-Prey. The first step of this cost and sizing is through a feasibility study where streaming IOT data of environmental data at a specified locale is pre-processed and then reviewed and mapped to a preliminary system design (the “check” phase). Based on this mapping, a performance analysis of the system configuration parameters is conducted (the “reduce” phase). Finally, using the Spider-Prey algorithm, an improved solution to the optimisation problem of the “energy trilemma” is evaluated (the “improve” phase). To validate this optimised system design, extrapolated sensor data, supplemented with historical data, of environmental conditions at that locale is applied to different scenarios, utilising different arrangements of renewable energy components, along with known costing and sizing models in order to find the best HRES configuration. From our work, it was determined that a composite of a select number of solar panels, wind turbines, a biomass gasifier, and a collection of batteries was able to provide an optimal combination of the least system cost with the highest reliability in terms of meeting load demand.
Although bio-inspired algorithms have been applied in many domains, such as in this case of an optimal hybrid renewable system, the swarm-based nature of the algorithms used in this domain consist of many independent agents using collective intelligence to achieve a goal. The independence of actions leads naturally to parallelism and the potential slow convergence of these algorithms leads to the need for high-performance computing in order to assist the algorithms to attain their goals faster and, thus, make them a more viable option in many domains. This area will be touched on in the final stage of the talk.
Cyber-Physical Systems (CPS)  are key building blocks of the fourth industrial revolution. They consist of ICT systems that are embedded in the physical objects we manipulate daily, as well as in our environment to provide an interface to the physical world. It is predicted that with the advances made in the Internet-of-Things, Next Generation Networking, Cloud computing and Artificial Intelligence, hundreds of thousands of islands of CPS which are currently geographically distributed in different countries and regions worldwide, will be federated to generate unprecedent datasets, which when submitted to emerging artificial intelligence (e.g. machine learning) models, will provide solutions to some of the key problems that the world could not solve till today. However, unlike traditional IT systems, CPS include a physical space and a cyber space and both need to be protected to avoid heavy economic damages and human lives losses resulting from potential cyber-attacks. On the other hand, recent literature on cybercrimes has revealed that the CPS-IoT, a key element of the CPS that manages its physical space, is the component which is most targeted by attackers to compromise the CPS operation through the IoT nodes. However, owing to their lightweight nature, the IoT nodes are often capable of implementing only limited security functions while IoT gateways are more powerful devices capable of implementing advanced security functions. Building upon this assumption, this talk revisits the issue of CPS security to propose a novel security model where computational resources are availed by IoT gateway devices to achieve “CPS-IoT surveillance” with the objective of detecting attacks launched on the CPS . The proposed security model include i) a “Topology Replication Process” for detecting topology attacks and ii) a “Traffic Classification Model” using machine learning techniques to detect and classify traffic attacks. These two models can be used to detect topology and traffic attacks launched against two of the most widely IoT protocols, namely RPL and MQTT, as well as the least interference beaconing protocol (LIBP) [2,3,4,5] protocol
 Bagula A, Ajayi O, Maluleke H. Cyber physical systems dependability using cps-iot monitoring. Sensors. 2021 Apr 14;21(8):2761.
 A. Bagula, D. Djenouri, E. M. B. Karbab. Ubiquitous sensor network management: The least interference beaconing model. In proceedings of the 2013 IEEE 24th Annual International Symposium on Personal, Indoor, and Mobile Radio Communications (PIMRC), Pages 2352-2356, 2013.
 Antoine Bagula. Hybrid traffic engineering: the least path interference algorithm. In Proceedings of the SAICSIT 2004 Conference, Pages 89-96, 2004.
 A. Bagula and Z. Erasmus, IoT emulation with cooja, ICTP-IoT workshop, Trieste-Italy 2015.
 A. Bagula, L. Mbala & O. Ajayi. Cyber Physical Systems Using CPS-IoT Surveilleance. ISAT Technical Report, Tech-Report-03-October2022.
Seeking the balance between power consumption and performance. Is sacrificing a little bit of performance while considerably reducing the power consumption acceptable for research? A short study of the work that was done at the University of Cambridge with our labs on reducing power consumption while maintaining acceptable performance numbers, resulting in a initial listing in the top10 in the June 2022 listing.
Rick Koopman, Tech Lead and Director for Lenovo’s High-Performance Computing and Artificial Intelligence business segment for Emerging Markets in EMEA is going to talk about the challenges we will run into the coming period adopting new technologies while walking the path towards Zero Emissions Computing.
Lenovo is an established leader in HPC, with ~180 of the world’s top 500 supercomputers. We’re also a leader in energy efficient HPC, with Neptune liquid cooling technologies powering research centers on five of seven continents.
This is why we say that from genomics to weather, seismic to space, Lenovo is helping solve humanity’s greatest challenges. And there is no greater challenge to humanity than the climate crisis.
I’m here today to talk to you about a topic you’ll be hearing a lot about in the years to come: Sustainable computing. Sustainable is an approach to computing that seeks to reduce or eliminate the carbon footprint of the data center gear, from shipment to operation, to disposal.
Extracting the highest possible performance from supercomputing systems while achieving efficient utilization has traditionally been incompatible with the secured, multi-tenant architecture of modern cloud computing. A cloud-native supercomputing platform provides the best of both worlds for the first time, combining peak performance and cluster efficiency with a modern zero-trust model for security isolation and multi-tenancy. The NVIDIA Cloud-Native Supercomputing platform leverages the NVIDIA® BlueField® data processing unit (DPU) architecture with high-speed, low-latency NVIDIA Quantum InfiniBand networking to deliver bare-metal performance, user management and isolation, data protection, and on-demand high-performance computing (HPC) and AI services—simply and securely.
The Intergovernmental Panel on Climate Change (IPCC) Assessment Report six (AR6) shows that much of the globe will experience water and food insecurity, wellness challenges, and damage to infrastructure due to climate change. There is therefore a need for long term observation networks that that can provide clear information on the scale of the problem. Africa and the Southern hemisphere, however, have a sparse environmental observation network, which has a detrimental effect on the understanding of earth system process and on numerical modelling in general. The South African Environmental Observation Network (SAEON)’s vision is to provide world class environmental research platforms for a sustainable society. The presentation will provide information on SAEON’s earth system observation network and research platforms, data policy and accessibility of the platforms from the seven nodes across South Africa. SAEON hosts three projects of the South African Research Infrastructure Roadmap (SARIR) namely 1) the Expanded Freshwater and Terrestrial Environmental Observation Network (EFTEON), 2) the Shallow Marine and Coastal Research Infrastructure (SMCRI) and the South African Polar Research Infrastructure (SAPRI). These observation networks are going to be used increasingly in numerical modelling studies, to inform model development, data assimilation and verification. Close collaboration between observation and modelling scientists will result in better understanding of earth system processes, skilful models, increased local model development capacity and the improvement of multi-hazard early warning systems.
Seasonal Auto-Regressive Integrated Moving Average with exogenous factors (SARIMAX) has shown promising results in modeling small and sparse observed time-series data by capturing linear features using independent and dependent variables. Long short-term memory (LSTM) is a promising neural network for learning non-linear dependence features from data.
With the increase in wildlife roadkill patterns, SARIMAX-only and LSTM-only would likely fail to learn precisely several endogenous and/or exogenous variables driven by such wildlife roadkill data. In this paper, we design and implement an error correction mathematical framework based on LSTM-only. The framework extracts features from the residual error generated by a SARIMAX-only model.
The learned residual features correct the output time-series prediction of the SARIMAX-only model. The process combines SARIMAX-only predictions and LSTM-only residual predictions to obtain a hybrid SARIMAX-LSTM. The models are evaluated using South African wildlife-vehicle collision datasets, and experiments show that compared to single models, SARIMAX-LSTM increases the accuracy of a taxon whose linear components outweigh the non-linear ones. In addition, the hybrid model fails to outperform LSTM-only whenever a taxon contains non-linear components than linear components. Our assumption of the results is that the collected exogenous and endogenous data are insufficient, which limits the hybrid model's performance since it cannot accurately detect seasonality on residuals from SARIMAX-only and minimize the SARIMAX-LSTM error. We conclude that the error correction framework should be preferred over single models in wildlife time-series modeling and predictions whenever a dataset contains more linear components. Adding more related data may improve the prediction performance of SARIMAX-LSTM.
With the affordances and advancement of Artificial Intelligence-based systems and machines, the application and relevance to multiple fields and disciplines have been ever-increasing. The application of Artificial Intelligence (AI) in multiple sectors, if not every sector, is indicative of its broad spectrum of uses. In very broad terms, AI is “intelligence” displayed by machines that mimic the natural cognitive skills of humans. A modern-day definition relates not to the confines of AI’s ability to articulate, but rather to rationality and the ability to act rationally.
AI is a wide field of research that consists of multiple sub-domains, each with its applications and use. In general, the use of AI-based systems and machines aims to enhance, augment, and automate tasks that require human-like reasoning, problem-solving, and perception-based skills. Ultimately, as with human-based work, the end goal is to take a set of tasks and instructions that need to be articulated to reach a specific goal.
There is a problem
AI uses cases for the advancement and betterment of society which are only a Google search away, and there are exemplary examples of its application that have aided organisations, companies, communities, and countries. However, AI does not come without its own set of challenges, including risks from individuals acting in bad faith, the weaponisation of AI, biased data, job loss, and existential issues. The application thereof is mostly dictated and developed by the ability and capacity of an individual/organisation’s creativity, potentially bringing about incorrect or harmful outcomes (sometimes with intent).
It is closer to home than we think
Unfortunately, AI's harmful use and application within the academic environment have become an increasing phenomenon. As much as AI is being used to find innovative solutions to research questions, so much so it is being used to circumvent detection efforts or for the falsification of research data. Historically the use of AI-based systems and tools was confined to specific disciplines and labs. However, today these systems are readily available and need little to no skill in obtaining their benefit. Online tutorials and ready-made tools are freely available to showcase and use. There are multiple reasons why researchers would falsify research (e.g., career and funding pressures, institutional oversight, inadequate training, etc.) which ultimately relates to questionable integrity practices. Many of these methods and subversion efforts are extremely hard to detect and interrogate as the systems and tools are ever-evolving, and current detection efforts age within a very short time frame.
It is not all bad news
The use of AI in research has many benefits and the use thereof should not be discouraged. It is when an individual or group of researchers purposely fail to acknowledge the use thereof and where the intent of the researcher(s) is to avoid detection of generated data/processes which should concern us. The following can serve as guiding questions/notes when using AI as part of your research process:
• Have I ensured that the data is generated ethically?
• Can I ensure that the data has all forms of bias removed?
• Will the generated data harm or affect a specific group of individuals?
• State the use and purpose of AI methods and tools as part of the research process when writing a research proposal, collecting data, and writing up research findings
• Question the collection methods/algorithms used when using open datasets
The key to it all
“Removing improper incentives, training researchers, and imposing better governance is vital to reducing research misconduct. Awareness of the possibility of misconduct and formalised procedures that scrutinize study trustworthiness is important during peer review and in systematic reviews.” - Li, W., Gurrin, L. C. and Mol, B. W. (2022) “Violation of Research Integrity Principles Occurs More Often Than We Think,” Reproductive biomedicine online, 44(2), pp. 207–209. Click here to read it.
The above statement is the most effective form of combating and preventing the falsification of research data in general, and with the use of AI. The efforts are also not an individual's responsibility, but that of a whole organisation that strives to ensure sound and ethical research practices.
This presentation will provide an overview of the current trends, preventative measure, and training involved to counter the use of AI for the purposes of faking research data
Tuberculosis (TB) is a chronic lung disease caused by infection with Mycobacterium tuberculosis (Mtb). TB is the leading global cause of infectious disease-related deaths and disproportionately affects populations in the Global South. While TB can be treated with combination antibiotic therapy, there remain substantial therapeutic challenges. The extended duration of treatment for TB results in compliance issues which has led to the development of drug resistance. Therefore, new strategies to shorten treatment duration and combat resistance are required. Drug resistance is driven by mutations in specific Mtb proteins targeted by antibiotics. Therefore, one strategy to circumvent resistance is to inhibit the bacterial proteostasis system, as mutated protein variants may be less stable than the wild type variants. In the context of rifampicin resistance, we studied the effect of drug-resistance mutations in the target protein RpoB. We show that RpoB is a client protein of the major Hsp70 proteostasis complex in Mtb and that Hsp70 can protect RpoB from stress inactivation. Molecular dynamics simulations using GROMACS suggested that drug-induced mutations alter the stability of the RpoB protein. Specifically, RpoB drug resistant mutations D435V, D435V-H445Y, D435V-H445Y-S450L, H445Y, and G442A-S450L led to reduced protein stability, while S450L led to a more stable RpoB phenotype. The in silico predictions were compared to in vitro biochemical assays in which the thermal stability and Hsp70 interaction were compared. Taken together, our data suggest that RpoB drug-induced mutations alter protein stability and Hsp70 interaction, and may predict the reliance of these proteins on the Mtb Hsp70 proteostasis system. Consequently, inhibition of Hsp70 is being evaluated as a strategy to sensitise bacteria to rifampicin.
The broad overarching aim is to discover novel materials suitable for high-capacity energy conversion using density functional theory (DFT) calculations and computational screening techniques. In this presentation, the application of high-performance computing facility on different materials related project is discussed as well as how this has enabled materials design and prediction. Various projects engaged in using the high-performance facility as provided by the centre for high performance computing in South Africa (CHPC) will be discussed. Firstly, 2D materials and heterostructures as possible photocatalytic materials as well as photovoltaic materials is discussed. We found that certain materials resulted in improved photocatalytic and photovoltaic properties. The reduction of platinum group metal towards the catalytic dehydrogenation of liquid organic hydrogen carriers (LOHC) is important. Pt based Sn/Co alloys were explored and found to have better dehydrogenation catalytic properties compared with pristine Pt metals resulting in cost reduction attributed with reduced Pt loading. The evaluation of doped IrO2 for OER showed that improved catalytic properties can be obtained.
This session will bring members of the HPC Education community together to discuss the establishment of a sustainable HPC Education Community of Practice for the African region. The session will identify the needs of members in the community through an interactive discussion, where members can share questions, ideas, and suggestions. The session will conclude with practical next steps for the establishment of a sustainable African HPC Education community, and the opportunity for community members to share resources, collaborate, and grow the impact of HPC Education in Africa.
Anyone involved in HPC Education / training and/or who want to establish a formal Community of Practice is encouraged to participate in this session.
Healthcare firms need to respond faster to the rapidly changing threat landscape. Risks to patient privacy and safety are increasing due to recent cyber-attacks. Healthcare firms are lagging in building cybersecurity capabilities prescribed by best practice approaches. This case study aims to identify the barriers to building a dynamic cybersecurity capability within a South African healthcare software services firm. The firm provides cloud-based software as a service (SaaS) solutions to medical practitioners and hospitals. The study used interviews and document analysis as primary data collection methods. Thematic analysis guided by a dynamic capability perspective was used to identify the internal and external barriers that could impede building a dynamic cybersecurity capability at a healthcare software services firm. The research recommends interventions to address cybersecurity barriers in healthcare software services firms.
There is a lot of data that is collected and available on many systems. This data could be collected
by certain systems and users could knowingly or unknowingly be sharing the data. Nowadays
with high cases of cybercrime as many users are accessing services online, the issues of data
management become critical. On the other hand, users do not want to provide same data on
related systems over and over again. As a result, data managers propose open data environment.
Open data is data that can be freely used, re-used and redistributed by anyone. In order for any
data to be considered open, it must always be available and downloadable via the internet in a
modifiable format. The availability of open data has provided benefits to citizens, organisations
and even governments as it gives easier access to information and allows for the improvement
of services while promoting a culture of innovation. Even though open data has a number of
benefits, the biggest challenge facing open data is privacy. Privacy can be defined as the
appropriate use of personal data. Opening access to such data involves trading off privacy for
utility or vice versa. Releasing the raw data allows for better engagement with the data however
this creates privacy risks. Protecting the data limits the usefulness of the data. Therefore, a
balance between privacy and utility must be maintained. It therefore becomes challenging to
release data while ensuring that it is useful. This study aims to understand the limits of open data
in terms of excluding critical users. This study investigates the currently existing South African
open data repositories to identify users participation and privacy risks that exist in terms of the
data fields that are released in the datasets. The study is rural digital users centred and puts them
at the centre of open data.
The key research question are:
The proposed approach is a qualitative co-design approach through citizens engagements. We
categorise and group the participants according to their skills and understanding of digital
systems and data management. We then experiment and educate the citizens on open data,
data management and data security mechanisms and alert them on the best practices. The study
is expected to produce guidelines on how to engage the citizens within the open data
environment. A citizen centric open data portal is formulated and evaluated by key stakeholders.
Greenhouses have been used for centuries to protect plants from adverse weather conditions and insects. Ventilation of greenhouses are of vital importance to ensure quality crop production. If temperatures in a greenhouse is too high, poor plant growth may result, and an increased need for frequent watering. A mechanical ventilation system might be required to cool the inside of the greenhouse. Natural ventilation is an alternative option used to ventilate greenhouses. Natural ventilation uses temperature and wind to control the indoor climate of greenhouses. Unfortunately greenhouses are extremely energy intensive. Energy costs are the third highest cost related to greenhouse crop cultivation. Reducing the operating costs of energy associated with greenhouse cultivation may result in a price reduction of greenhouse cultivated crops. Conducting experimental work on ventilation of greenhouses can be costly and cumbersome. Using computational methods such as CFD (Computational Fluid Dynamics) to obtain qualitative and quantitative assessment of greenhouses can reduce costs and time involved. The computer cluster at the Centre for High Performance Computing has been used since 2017 to conduct these numerical investigation using StarCCM+. Specifically heat transfer in single span greenhouses has been investigated as regards to various parameters. Aspects such as ventilator position, differences between two and three-dimensional simulations, and the effect of benches inside the greenhouse were investigated.
Humans intuitively perceive other human beings' structural and behavioural attributes, and the ability to allow a machine to do the same thing has been age-old pursuit. Recent advances in deep learning, specifically in pose estimation, tracking and action analysis, have unlocked the ability to encapsulate users and their behaviour in a scene. This talk focuses on how deep learning methods can solve problems experienced in security, medicine and sports, along with the current wins and challenges many implementers face when bringing AI into our daily lives.
The HPC Ecosystems Project (and the SADC CyberInfrastructure Initiative) is responsible for the repurposing and distribution of decommissioned tier-1 HPC systems. A significant part of the project's scope is the training of an HPC System Administrator workforce. This session will bring members of the partner sites together to discuss the general progress of the project, as well as identifying the needs of the community members to assist the Project Leadership in planning engagements and preparing resources for the upcoming year.
The session will include a presentation on a formal evaluation of access to HPC resources for Historically Disadvantaged Institutions (HDIs) in South Africa, as well as feedback from partner sites and the HPC Ecosystems community, and discussions on future plans and needs of the community. There will be a group photo at the end of the session.
Join Trish Damkroger, Chief Product Officer, SVP and GM of HPE’s High Performance Platforms, Solutions and Engineering team as we discuss the art of the possible.
Focusing on purpose-built solutions for your organization, we will cover our acceleration into the Exascale Era and how to realize new business value and gain insights with the convergence of modelling & simulation and AI. We will be presenting on how you can access supercomputing technology and empower your most complex workloads with optimized solutions.
Also, find out how HPE will help CHPC research community to work with the pre-Exascale community. This community of converged HPC supercomputing is changing how we look at science and how we start providing insight and benefits beyond academics and bringing those benefits to humanity.
The Fourth Industrial Revolution (4IR) is disrupting every known industry and profession, including that of science, technology, engineering and mathematics (STEM). Digital technologies are rapidly changing the way industries and professions operate. However, studies found a low level of adoption of 4IR technologies in STEM education in South Africa due to limited awareness about the critical role that innovative technologies can play in the STEM education space.
STEM education in South Africa is overdue for radical transformation and the adoption of 4IR technologies in teaching and learning pedagogy, which includes curriculum integration, teaching methods and assessments, that can ensure current and future South African graduates are sufficiently knowledgeable and skilled to cope with the challenges they will face.
This talk aims to explore the possibilities of integrating 4IR technologies in the classroom and the advantages it offers to improve STEM curriculum, teaching and assessment in the future.
N. Chetty1,2, N.F. Andriambelaza1
1 Department of Physics, University of Pretoria, Pretoria 0002, South Africa
2Faculty of Science, University of Witwatersrand, Johannesburg 2000, South Africa
Density functional theory (DFT) method as implemented in the Quantum Espresso packages was
used to investigate the effects of trivalent atoms such as B and Ga substituting Si atom in a bilayer
silica material. This bilayer system was recently proposed by researchers from Brookhaven lab
(Boscoboinik et al. ) as a suitable candidate for 2D representative of zeolites. The effects of the
trivalent atoms on the stability, structural and electronic properties of the 2D zeolite model were
explored. The formation energy analysis revealed that the introduction of B atom is exothermic
whereas that of Ga atom is endothermic. Next, the introduction of B and Ga were found to affect the
bond lengths of the system, however, it does not lead to a significant deformation of the structure.
Regarding the electronic properties, the Fermi level was shifted towards the valence band revealing
the obtention of p−type materials. The presence of B and Ga atoms in bilayer material results in a
net negative charge to the framework. In the present study, proton and alkali metals were considered
as charge balance. Their suitable site preference as well as the best candidate for charge balance
were identified. The density of states analysis showed that the presence of cations induces defect
states near the band edges narrowing the band gap. Our results provide detailed information about
the properties of the doped silica bilayer at the atomic level which is beneficial for its
nanotechnological applications as well as for its fully validation as 2D model for zeolite.
 Boscoboinik, J. Anibal, and Shamil Shaikhutdinov. Catalysis letters 144.12 (2014): 1987-1995.
Chemical engineering thermodynamics encompasses not only heat engines and processes such as refrigeration and liquefaction, but also phase equilibria and rigorous yet practical models of molecular behaviour. Accurately predicting the thermophysical properties and behaviour of materials is of key importance not just from a basic science perspective, but also for the accurate modelling and design of equipment and chemical processes. This contribution provides an overview of the application of high performance computing towards chemical engineering thermodynamics research at Mangosuthu University of Technology (MUT). Simulating large systems of particles at the molecular necessarily requires significant computational effort and resources. By employing molecular simulations and computational fluid dynamics, researchers at MUT—in collaboration with scientists and engineers from other institutions such as the University of KwaZulu-Natal, Durban University of Technology, and the Nuclear Energy Corporation of South Africa—have tackled a variety of problems including biogas cleaning, clathrate hydrate stability, polymer fluorination, renewable fuel gas upgrading, cement degradation, and water pollution remediation. The importance of high performance computing to this research is outlined and the impact of selected research outputs is discussed, including notable publications which received coverage in the popular science press and research which was of immediate and practical relevance to industry.
The Council for Geoscience (CGS) is mandated to develop and publish world-class geoscience knowledge products and to render geoscience-related services to the South African public and industry. In order for the CGS to fulfil this mandate and to advance the geoscience field within South Africa and beyond, the organisation is utilising high-performing computation resources provided by the Department of Science and Innovation, Centre for High-Performance Computing (CHPC). Application of High-performing computation is part of the geoscience innovation initiative which is meant to address some of the societal challenges the world is facing. The paper highlights the concept, history, present and future of geoscience innovation. High-computing resources are being used in geophysics, particularly in airborne electromagnetic and magnetotelluric data inversion and also for seismological data processing. Geophysics is a science which uses physical measurements to understand the behaviour of the Earth. This is done by making massive observation datasets and large-scale computer simulations to improve our knowledge of the Earth. Most of the research projects involve computer-intensive processing and inversion of terabyte-scale geophysical data recorded by millions of recording points. This often requires parallel architectures such as the one at CHPC.
Questions after talks.
Proposed design of the visualisation system for CHPC
Visualisation systems are used to analyse scientific data in various disciplines such as material science, computational physics, chemistry and climatology. The Centre for High Performance Computing (CHPC) users visualise their scientific data utilising their own laptops and desktops, of which, these computers sometimes do not have adequate computational resources such as processor, memory, graphics processing unit (GPU) and large screen to effectively display data generated by parallel programs. Users completed the survey about the proposal to setup a new visualisation system at CHPC and the results of these questionnaires will be discussed accordingly. To this end, we propose an in situ visualisation system, which can be used to visualise data in real-time or after it is being produced by the simulation executed on the High Performance Computing system.
Our research team focuses primarily on improving animal health through the combination of experimental and computational genomics. These include the experimental characterization of genes and antigens from field samples for pathogen detection, immunoinformatics analysis of antigens from proteomes, and genomes of coccidian parasites available in public databases for vaccine development. The second focus of our research group is to understand and manipulate the rumen microbiome with probiotics using metagenomics with the intent of addressing antimicrobial resistance emanating from animal production. Our research group heavily depends on the Centre for High Performance Computing facilities, especially for the analysis of huge datasets we often work with in order to answer important biological questions relevant to our focus areas. Findings from our research have applications in the control and development of molecular diagnostic tools for coccidian parasites in animals. In addition, our results on metagenomics showed lowered rumen pathogenic bacterial population following manipulation with probiotics.
Catalysis plays a huge role in the chemical industry as almost every chemical process utilized to produce household, industrial and consumer products requires the use of a catalyst. Hence, the discovery and/or development of new catalysts is a very active field with various experimental strategies/techniques employed, including synthesis, spectroscopic characterization, and reaction optimization. Although computational chemistry methods allow a speedy implementation of this process, the rise of Chemoinformatics as well as machine learning techniques in chemistry in recent years, in addition, have created a pathway to accelerate this discovery process even further. Hence in this talk, I will present how a tripartite alliance of these three methods – computational chemistry, chemoinformatics and machine learning can be explored to search for more active catalysts for important chemical processes using non-heme Fe(II) alkane oxidation catalysts as a case study.
The rapid growth in technology is providing unprecedented opportunities for scientific inquiry. However, dealing with the data produced has resulted in a crisis. Computer speeds are increasing much faster than are storage technology capacities and I/O rates. This ratio is also getting worse for experimental and observational facilities, where for example, the Legacy Survey of Space and Time (LSST) observatory will collect up to 20 TB per night in 2022, yet the Square Kilometre Array will generate over 2 PB per night in 2028. This reality makes it critical for our community to 1) Create efficient mechanisms to move and store the data in a Findable, Addressable, Interoperable, and Reproducible (FAIR) fashion; 2) Create efficient abstractions so that scientists can perform both online and offline analysis in an efficient fashion; 3) Create new reduction algorithms which can be trusted by the scientific community, and which can allow for new ways to not only reduce/compress the data but also to reduce the memory footprint and the overall time spent in analysis.
To tackle these goals, My group had worked closely with many large-scale applications and researchers to co-design critical software infrastructure for these communities. These research artifacts have been fully integrated into many of the largest simulations and experiments, and have increased the performance of these codes by over 10X. This impact was recognized with an R&D 100 award in 2013 and was highlighted in the 2020 US Department of Energy (DOE) Advanced Scientific Computing Research (ASCR) @40 report. In this presentation, I will discuss the research details on three major contributions I have led: large-scale self-describing parallel I/O (ADIOS), in situ/streaming data (SST), and data refactoring (MGARD). I will introduce the overall concepts and present several results from our research, which has been applied and fully integrated into many of the world’s largest scientific applications.
Replication has been successfully employed and practiced to ensure high data availability in large-scale distributed storage systems. However, with the relentless growth of generated and collected data, replication has become expensive not only in terms of storage cost but also in terms of network cost and hardware cost. Traditionally, erasure coding (EC) is employed as a cost-efficient alternative to replication when high access latency to the data can be tolerated. However, with the continuous reduction in its CPU overhead, EC is performed on the critical path of data access. For instance, EC has been integrated into the last major release of Hadoop Distributed File System (HDFS) which is the primary storage backend for data analytic frameworks such as Hadoop and Spark. This talk explores some of the potential benefits of erasure coding in data-intensive clusters and discusses aspects that can help to realize EC effectively for data-intensive applications.
Optional session: CCP5 Software Training Discussion - DL_POLY at 13:20 to 13:55 in Emerald.
Our research combines theoretical aspects of mathematics and physics with large-scale numerical investigations to understand characteristics of diverse multidimensional chaotic systems. By computing physical quantities as well as dynamical indicators, such as Lyapunov exponents, modern theoretical and numerical techniques and powerful computation resources enable us to quantify chaos and its importance in complex models with hard-to-predict behaviour. This includes investigations into the spreading of waves in 1D and 2D lattices describing solid-state materials and studying the role of chaos in wave propagation. Chaotic wave spreading is also relevant in granular chains, soft architected structures, and tight-binding models where optimised parallel computations enable novel findings. Apart from lattices, another aspect which attracted the attention of our group is the chaotic dynamics of charged particles in complex magnetic fields, which are central to the control of plasma in experiments. We also apply various numerical techniques to models from biology and chemistry, particularly related to the behaviour DNA and graphene, where we are able to probe physical and dynamical properties of these materials, as well as systems describing the motion of stars in galactic potentials. Furthermore, we develop and test efficient task-specific numerical techniques, while computationally we make use of OpenMP, CUDA, and effective task-splitting to use GNU Parallel and pbsdsh.
Transition metal carbides (TMCs) are recognised as the cheap alternative to the rare-expensive noble metals such as platinum, palladium, and Ruthenium . This is obvious from similar electronic properties these materials share with noble metals. As a result, there has been great interest in the application TMCs in catalysis, in particular molybdenum carbides as catalysts and catalyst support materials. Many studies show molybdenum carbides as highly active carbon dioxide activation, dehydrogenation, and water gas shift (WGS) catalysts (and supports) [1-3].
In recent works of Ma and co-authors, α-MoC supported Pt and Au clusters showed extraordinary activities towards hydrogen production via the WGS reaction at ambient temperatures [2,3]. However, with limited surface coverage of α-MoC surfaces with Pt/Au, the observed activity quickly diminishes. Experimental work suggests that this deactivation (rapid decrease in activity) is associated with partial surface oxidation – occurring due to stable and immobile hydroxyl intermediates. Limited density functional theory (DFT) calculations are performed to understand the activity of these systems. However, the mobility of various reaction intermediates on these surfaces remains relatively unexplored. Furthermore, the assumed surface is based on experimental observation – this however does not exclude the presence of other surfaces. We have used the CHPC (CPU nodes) resources to explore the mobility and reactivity of various reaction intermediates on α-MoC(111) surface. Additional data over the more stable α-MoC(100) has also been generated. To our surprise, over the observed surface all reaction intermediates are extremely mobile while over the α-MoC(100) surface the converse is true. In this talk we will discuss these findings and their implication on catalyst design where α-MoC is considered as either an active phase or catalyst support.
Deng, Y., et al. (2019). Molybdenum Carbide: Controlling the Geometric and Electronic Structure of Noble Metals for the Activation of O-H and C-H Bonds. Acc Chem Res 52(12): 3372-3383.
Zhang, X., et al. (2021). A stable low-temperature H2-production catalyst by crowding Pt on alpha-MoC. Nature 589(7842): 396-401.
Yao, S. et al. (2017). Atomic-layered Au clusters on a-MoC as catalysts for the low-temperature water-gas shift reaction. Science 357(6349): 389-393
Parallel file systems are at the core of HPC I/O infrastructures. Those systems minimize the I/O time of applications by separating files into fixed-size chunks and distributing them across multiple storage targets. Therefore, the I/O performance experienced with a PFS is directly linked to the capacity to retrieve these chunks in parallel. In this work, we conduct an in-depth evaluation of the impact of the stripe count (the number of targets used for striping) on the write performance of BeeGFS, one of the most popular parallel file systems today. We consider different network configurations and show the fundamental role played by this parameter, in addition to the number of compute nodes, processes and storage targets.
Through a rigorous experimental evaluation, we directly contradict conclusions from related work. Notably, we show that sharing I/O targets does not lead to performance degradation and that applications should use as many storage targets as possible. Our recommendations have the potential to significantly improve the overall write performance of BeeGFS deployments and also provide valuable information for future work on storage target allocation and stripe count tuning.
Open source plays an increasingly important role in society. Almost 90 companies (many of them Fortune 100) have now dedicated open source program offices (OSPOs) responsible for the implementation of the company’s open source strategy. Industry is placing a high value on organizations who know how to leverage open source, as for example IBM’s acquisition of RedHat for $34 billion indicates. Yet universities are just beginning to work on formalizing their relationship to open source communities and are discovering open source as an important alternative to traditional tech transfer. The Alfred P. Sloan Foundation in their mission to reduce barriers in the research enterprise is funding the establishment of OSPOs at six universities, including Johns Hopkins, RIT, UC Santa Cruz, Vermont, CMU, and SLU. The US National Science Foundation recently created the new Directorate for Technology, Innovation, and Partnerships (TIP) to accelerate development and deployment of new technologies. TIP programs are now funding efforts to create the support infrastructure around open source research products so they can mature into sustainable open source ecosystems.
In this talk I will make the case why universities should pay attention to open source and review some of the opportunities created by establishing an OSPO. I will outline the main lessons from founding and running the Center for Research in Open Source Software (CROSS) since 2015 and the new programs I am implementing as part of the OSPO UC Santa Cruz with funding by the Alfred P. Sloan Foundation and the NSF, to accelerate reproducible research delivery with open source strategies and techniques. As case study I will give an overview of the Skyhook Data Management project that embeds relational data processing in the Ceph distributed storage system, with ongoing work to do the same for Argonne National Lab’s Mochi exascale storage services ecosystem and Sandia National Lab’s FAODEL data exchange libraries.