Speaker
Description
One goal of support staff at a data center is to identify inefficient jobs and to improve their efficiency.
Therefore, a data center deploys monitoring systems that capture the behavior of the executed jobs.
While it is easy to utilize statistics to rank jobs based on the utilization of computing, storage, and network, it is tricky to find patterns in 100.000 jobs, i.e., is there a class of jobs that aren't performing well.
In this talk, a methodology to rank the similarity of all jobs to a reference job based on their temporal IO behavior is described.
A study is conducted to explore the effectivity of the approach which starts from three reference jobs and investigates related jobs.
The data stems from DKRZ's supercomputer Mistral and includes more than 500.000 jobs that have been executed for more than 6 months of operation.
Student? | No |
---|