Centre for High Performance Computing 2025 National Conference

Name: Centre for High Performance Computing 2025 National Conference
Start: 2025-11-30T08:00:00+02:00
End: 2025-12-03T22:20:00+02:00
Location: Century City Conference Centre

30 November 2025 to 3 December 2025

Century City Conference Centre

Africa/Johannesburg timezone

PLEASE NOTE: Registrations Have Closed! Contact chpc@csir.co.za for further queries.

Info

helpdesk@chpc.ac.za

Reference Genome Assembly, Pangenome Construction, and Population Analysis of the Spotted Hyena (Crocuta Crocuta) from the Kruger National Park

1 Dec 2025, 12:00

20m

1/1-11 - Room 11 (Century City Conference Centre)

1/1-11 - Room 11

Century City Conference Centre

100

Talk Bioinformatics and Biological Sciences HPC Applications

Dr Ansia van Coller (South African Medical Research Council)

The spotted hyena (Crocuta crocuta) is a highly social carnivore with a complex behavioural and ecological functions, making it an important model for studying genetic diversity, adaptation, and evolution. However, previous draft genomes for C. crocuta have been incomplete and derived from captive individuals, limiting insights into natural genetic variation. Here, we present a high-quality de novo genome assembly and the first pangenome of wild spotted hyenas sampled from the Kruger National Park, South Africa, alongside population-level analysis.

Using Oxford Nanopore Technologies (ONT) long-read sequencing, we assembled a 2.39 Gb reference genome with a scaffold N50 of 19.6 Mb and >98% completeness. We further performed short-read resequencing at 10-32X depth per individual, revealing >4 million single nucleotide variations and and ~1 million insertions and deletions per individual. To capture genomic variation beyond a single reference, we constructed a draft pangenome using Progressive Genome Graph Builder (PGGB). The resulting pangenome comprises ~2.47 Gb, with 35.2 million nodes, 48.4 million edges, and 159,060 paths, incorporating sequences from all individuals. Its graph structure revealed substantial topological differences, which may correspond to biologically relevant variations.

The breadth of these analyses required extensive use of the CHPC’s computing resources. Long-read genome assembly and polishing were executed on high-memory nodes to accommodate the error-correction and scaffolding steps. Repeat and gene annotation pipelines (RepeatModeler, BRAKER3) as well as variant discovery with GATK and BCFtools were parallelised to accelerate execution. Pangenome graph construction was particularly computationally intensive, requiring large-scale parallelisation and significant memory and storage capacity to manage multi-genome alignments and graph building.

This study provides the most contiguous wild-derived genome to date for the species, the first draft pangenome for C. crocuta, and establishes a foundation for future conservation and comparative genomics. Importantly, it demonstrates the critical role of HPC resources in enabling large-scale bioinformatics pipelines - from genome assembly to pangenome construction and population-level analysis - in non-model organisms.

Presenting Author	Ansia van Coller
Institute	South African Medical Research Council

Dr Ansia van Coller (South African Medical Research Council)

Dr Brigitte Glanzmann (SAMRC/BGI and Stellenbosch University) Dr Nadia Carstens (SAMRC Genomics Platform) Prof. Craig Kinnear (SAMRC Genomics Platform) Ms Victoria Cole (SAMRC Genomics Platform) Dr Tanya Kerr (Stellenbosch University) Prof. Wynand Goosen (Stellenbosch University) Prof. Michele Miller (Stellenbosch University)

There are no materials yet.

Centre for High Performance Computing 2025 National Conference

Info

Reference Genome Assembly, Pangenome Construction, and Population Analysis of the Spotted Hyena (Crocuta Crocuta) from the Kruger National Park

1/1-11 - Room 11

Century City Conference Centre

Speaker

Description

Primary author

Co-authors

Presentation Materials