30 November 2025 to 3 December 2025
Century City Conference Centre
Africa/Johannesburg timezone
PLEASE NOTE: Registrations Have Closed! Contact chpc@csir.co.za for further queries.

Optimising Synthetic Data Generation for Cybersecurity Datasets

2 Dec 2025, 14:00
30m
1/1-10 - Room 10 (Century City Conference Centre)

1/1-10 - Room 10

Century City Conference Centre

50

Speakers

Christian K. Devraj Jan Eloff (University of Pretoria)

Description

In recent years, cybercrimes have become more prevalent and impactful for all users of modern technology. Consequently, various artificial-intelligence-driven intrusion detection software have been implemented to detect and prevent such cyberattacks. Some well-known tools include Microsoft's Security Copilot and SentinelOne's Singularity. However, such AI tools are of-ten difficult to train and maintain, primarily because of the lack of available cybersecurity datasets. Furthermore, even when real cybersecurity datasets are collected, they may lack balance, reliability, and variety, making them inefficient for training AI intrusion detection tools. A trending solution to this predicament is synthetic data generation, particularly for meeting commercial cyber-security dataset requirements. However, synthetic data generation simply mimics the structure and content of real datasets and often reproduces the poor characteristics of the real datasets. Therefore, this study proposes the inclusion of Data Quality Metrics and data optimisation techniques during the synthetic data generation process to improve the quality of synthetic cybersecurity datasets. At the pinnacle of this research, an optimal process for producing synthetic datasets for cybersecurity research is proposed.

Primary author

Christian K. Devraj

Presentation Materials

There are no materials yet.