BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//CERN//INDICO//EN
BEGIN:VEVENT
SUMMARY:® Enable Fundamental Cacheability for Distributed Deep Learning T
 raining
DTSTART;VALUE=DATE-TIME:20231206T113000Z
DTEND;VALUE=DATE-TIME:20231206T115000Z
DTSTAMP;VALUE=DATE-TIME:20260310T191421Z
UID:indico-contribution-1969@events.chpc.ac.za
DESCRIPTION:Speakers: Ali Butt (Virginia Tech)\nDeep learning training (DL
 T) applications exhibit unique I/O workload behaviors that pose new challe
 nges for storage system design. DLT is I/O intensive since data samples ne
 ed to be fetched continuously from a remote storage. Accelerators such as 
 GPUs have been extensively used to support these applications. As accelera
 tors become more powerful and more data-hungry\, the I/O performance lags 
 behind. This creates a crucial performance bottleneck\, especially in dist
 ributed DLT. At the same time\, the exponentially growing dataset sizes ma
 ke it impossible to store these datasets entirely in memory. While today
 ’s DLT frameworks typically use a random sampling policy that treat all 
 samples uniformly equally\, recent findings indicate that not all samples 
 are equally important and different data samples contribute differently to
 wards improving the accuracy of a model. This observation creates an oppor
 tunity for DLT I/O optimizations by exploiting the data locality enabled b
 y importance sampling.\n\nIn this talk\, I’ll present the design of SHAD
 E\, a new DLT-aware caching system that detects fine-grained importance va
 riations at per-sample level and leverages the variance to make informed c
 aching decisions for a distributed DLT job. SHADE adopts a novel\, rank-ba
 sed approach\, which captures the relative importance of data samples acro
 ss different mini-batches. SHADE then dynamically updates the importance s
 cores of all samples during training. With these techniques\, SHADE manage
 s to significantly improve the cache hit ratio\nof the DLT job\, and thus\
 , improves the job’s training performance.\n\nhttps://events.chpc.ac.za/
 event/125/contributions/1969/
LOCATION:Skukuza 1-1-2 - Ndau
URL:https://events.chpc.ac.za/event/125/contributions/1969/
END:VEVENT
END:VCALENDAR
