1-3 December 2021
Africa/Johannesburg timezone
Conference Videos Available

Processing longitudinal population data using CHPC

1 Dec 2021, 15:15
30m
Talk NICIS Cloud Projects NICIS Cloud Projects

Speaker

Dr Kobus Herbst (SAPRIN)

Description

The South African Population Research Infrastructure Network (SAPRIN) curates longitudinal population data collected by four nodes from a total population of more than 400 000 individuals. Due to the dynamic nature of these study populations data representing episodes of individual surveillance needs to be combined in a way that maintains data integrity and takes into account variations between data collection sites.

We need to deconstruct 4,5 million person years of observation into a day level dataset, requiring the kind of processing and storage capacity provided by a high performance computing environment such as CHPC.

We will describe a data processing pipeline, originally developed in Pentaho and recently converted to the julia programming language which scales well on the CHPC environment.

Primary authors

Dr Kobus Herbst (SAPRIN) Molulaqhooa Maoyi (SAPRIN) Tinofa Mutevedzi (SAPRIN) Mark Collinson (SAPRIN)

Presentation Materials

There are no materials yet.