30 November 2025 to 3 December 2025
Century City Convention Centre
Africa/Johannesburg timezone
Invoice request deadline is this Friday 31 October 2025.

Reference Genome Assembly, Pangenome Construction, and Population Analysis of the Spotted Hyena (Crocuta Crocuta) from the Kruger National Park

1 Dec 2025, 12:00
20m
1/1-11 - Room 11 (Century City Convention Centre)

1/1-11 - Room 11

Century City Convention Centre

100
Talk Bioinformatics and Biological Sciences HPC Applications

Speaker

Dr Ansia van Coller (South African Medical Research Council)

Description

The spotted hyena (Crocuta crocuta) is a highly social carnivore with a complex behavioural and ecological functions, making it an important model for studying genetic diversity, adaptation, and evolution. However, previous draft genomes for C. crocuta have been incomplete and derived from captive individuals, limiting insights into natural genetic variation. Here, we present a high-quality de novo genome assembly and the first pangenome of wild spotted hyenas sampled from the Kruger National Park, South Africa, alongside population-level analysis.

Using Oxford Nanopore Technologies (ONT) long-read sequencing, we assembled a 2.39 Gb reference genome with a scaffold N50 of 19.6 Mb and >98% completeness. We further performed short-read resequencing at 10-32X depth per individual, revealing >4 million single nucleotide variations and and ~1 million insertions and deletions per individual. To capture genomic variation beyond a single reference, we constructed a draft pangenome using Progressive Genome Graph Builder (PGGB). The resulting pangenome comprises ~2.47 Gb, with 35.2 million nodes, 48.4 million edges, and 159,060 paths, incorporating sequences from all individuals. Its graph structure revealed substantial topological differences, which may correspond to biologically relevant variations.

The breadth of these analyses required extensive use of the CHPC’s computing resources. Long-read genome assembly and polishing were executed on high-memory nodes to accommodate the error-correction and scaffolding steps. Repeat and gene annotation pipelines (RepeatModeler, BRAKER3) as well as variant discovery with GATK and BCFtools were parallelised to accelerate execution. Pangenome graph construction was particularly computationally intensive, requiring large-scale parallelisation and significant memory and storage capacity to manage multi-genome alignments and graph building.

This study provides the most contiguous wild-derived genome to date for the species, the first draft pangenome for C. crocuta, and establishes a foundation for future conservation and comparative genomics. Importantly, it demonstrates the critical role of HPC resources in enabling large-scale bioinformatics pipelines - from genome assembly to pangenome construction and population-level analysis - in non-model organisms.

Institute South African Medical Research Council
Presenting Author Ansia van Coller
Registered for the conference? Yes

Primary author

Dr Ansia van Coller (South African Medical Research Council)

Co-authors

Dr Brigitte Glanzmann (SAMRC/BGI and Stellenbosch University) Dr Nadia Carstens (SAMRC Genomics Platform) Prof. Craig Kinnear (SAMRC Genomics Platform) Ms Victoria Cole (SAMRC Genomics Platform) Dr Tanya Kerr (Stellenbosch University) Prof. Wynand Goosen (Stellenbosch University) Prof. Michele Miller (Stellenbosch University)

Presentation Materials

There are no materials yet.